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ESTIMATION  OF  POSTERIOR  PROBABILITIES 
USING  MULTIVARIATE  SMOOTHING  SPLINES  AND 
GENERALIZED  CROSS-VALIDATION 


Miguel  Agustin  Villalobos 


A  thesis  under  the  supervision  of  Professor  Grace  Wahba 


A  nonparametric  estimate  for  the  posterior  probabilities  in  the 


classification  problem  using  multivariate  smoothing  splines  is  proposed. 
This  estimate  presents  a  nonparametric  alternative  to  logistic  discrimi¬ 


nation  and  to  survival  curve  estimation.  It  is  useful  in  exploring  proper¬ 


ties  of  the  data  and  in  presenting  them  in  a  way  comprehensible  to  the 
layman. 

The  estimate  is  obtained  as  the  solution  to  a  constrained  minimization 
problem  in  a  reproducing  kernel  Hilbert  space.  It  is  shown  that  under  certain 
conditions  an.  estimate  exists  and  is  unique.  <sT - - - “  "  - 

A  Monte  Carlo  study  was  done  to  compare  the  proposed  estimate  with  the 
two  parametric  estimates  most  commonly  used.  These  parametric  estimates 
are  based  on  the  assumption  that  the  distributions  involved  are  normal.  The 
spline  estimate  performed  as  well  as  the  parametric  estimates  in  most  cases 
where  the  distributions  involved  were  normal.  In  the  case  of  non-normal  distri¬ 


butions  the  spline  performed  consistently  better  than  the  parametric  estimates. 
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The  computational  algorithm  developed  can  be  used  in  the  more  general 
context  of  estimating  a  smooth  function  h.  when  we  observe 
Zi  =  Lj/i  +  Ci,  i  =  l,n,  where  c»'s  are  independent,  zero  mean  and  finite  vari¬ 
ance  random  variables,  Li's  are  linear  functionals,  and  the  solution  is  known  a 
priori  to  be  in  some  closed,  convex  set  in  the  Hilbert  space,  for  example,  the  set 
of  non-negative  functions,  or  the  set  of  monotone  functions.  This  type  of  prob¬ 
lem  arises  in  areas  such  as  cancer  research,  meteorology  and  computerized 
tomography. 

We  also  consider  the  estimation  of  the  logarithm  of  the  likelihood  ratio  by  a 
penalized  likelihood  method.  Existence  and  uniqueness  of  an  estimator  under 
certain  conditions  is  shown.  However,  a  data  based  method  to  estimate  the 
"correct"  degree  of  smoothness  of  the  estimator  is  not  given. 
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CHAPTER  1 


INTRODUCTION 

1. 1  Scope  and  Background. 

Since  the  early  work  of  Fisher  (1936)  considerable  advances  have  been 
made  in  the  practical  application  of  statistical  classification  techniques.  Host  of 
the  work  in  discriminant  analysis  for  continuous  variables  is  based  on  the 


assumption  that  the  distributions  involved  are  multivariate  Normal,  however,  in 
the  last  ten  years,  nonparametric  methods  have  received  considerable  atten¬ 
tion.  mainly  because  of  the  availability  of  cheaper  and  faster  computers. 

In  this  work  we  will  be  concerned  with  the  nonparametric  estimation  of  the 
posterior  probabilities  used  in  Bayesian  discriminant  analysis.  The  problem  of 
estimating  these  posterior  probabilities  is  solved  as  a  particular  case  of  the 
more  general  statistical  smoothing  problem  of  estimating  a  smooth  function 
subject  to  inequality  constraints. 

Wahba  (1979b),  introduced  the  multidimensional  smoothing  spline  in  the 
statistical  literature  as  a  tool  to  model  a  smooth  but  otherwise  unknown  func¬ 
tion.  She  devised  the  method  of  Generalized  Cross-validation  for  choosing  the 
parameter  that  controls  the  smoothness  of  the  spline  based  on  the  data. 

Wahba  (I960)  points  out  the  need  for  a  computational  algorithm  to  solve 

statistical  smoothing  problems  with  linear  constraints.  This  need  is  also  pointed 

out  by  Wegman  and  Wright  (1963),  who,  In  the  context  of  isotonic  regression  say: 

Computational  algorithms  are  dearly  the  stumbling  block  in  further  develop¬ 
ment  of  the  theory  of  isotonic  splines.  When  such  algorithms  become  available 
we  believe  that  smooth,  order-preserving  non- parametric  estimators  will  sub- 


z 


stantially  enhance  the  efficiency  of  estimation  procedures  currently  in  use. 

In  this  thesis  we  demonstrate  the  feasibility  of  doing  large  multidimensional 
smoothing  problems  with  inequality  constraints.  The  computational  algorithm 
developed  here  can  be  used  in  applications  such  as  survival  curve  estimation, 
logistic  regression  and  the  estimation  of  posterior  probabilities.  We  examine 
properties  of  the  constrained  smoothing  spline  by  a  Monte  Carlo  study. 

We  believe  that  the  methods  presented  in  this  thesis  will  be  useful  for 
exploring  properties  of  the  data  and  presenting  them  in  a  way  comprehensible 
to  the  layman. 

In  chapter  2  we  will  present  the  estimate  of  the  posterior  probabilities  as 
the  solution  to  a  constrained  optimization  problem  in  some  suitable  space  of 
"smooth"  functions.  We  will  describe  the  method  of  generalized  cross-validation 
for  constrained  problems  to  choose  the  smoothing  parameter. 

In  chapter  3,  we  discuss  the  details  of  the  actual  computation  of  the  spline 
and  a  step  by  step  algorithm  is  given. 

In  chapter  4-  some  simulation  results  are  presented  to  compare  the  spline 
estimate  of  the  posterior  probabilities  with  the  parametric  estimates  most  com¬ 
monly  used.  We  also  present  an  example  with  real  data. 

In  chapter  5,  the  estimation  of  the  logarithm  of  the  likelihood  ratio  is  con¬ 
sidered  using  a  Penalized  Likelihood  approach.  The  one  dimensional  results  of 
Silverman  (1976)  are  extended  to  multiple  dimensions  and  the  existence  and 
uniqueness  of  an  estimator  under  certain  conditions  is  established. 

Finally,  in  chapter  6  we  present  some  concluding  remarks  and  possible 
directions  tor  future  research. 

Most  of  the  mathematical  optimization  background  and  functional  analysis 
results  used  throughout  this  work  are  presented  in  appendix  Al.  The  listing  of 


the  documentation  for  the  routines  that  estimate  and  evaluate  the  constrained 
spline  is  given  in  appendix  A2. 


1.2  Conventions  anu  Notation 


Each  symbol  used  is  defined  at  its  first  occurrence.  Vectors  are  usu¬ 
ally  denoted  by  lower  case  letters  and  no  sub-tildes  are  used.  Matrices  are  usu¬ 
ally  denoted  by  capital  letters  and  are  defined  by  giving  their  i01  row  and  3  th 
column  entry  in  parenthesis  as  in  the  following  example: 

B  :=  (by)  i=l . n ;  ;  =  1 . m 

The  expression  above  defines  a  matrix  B  as  an  n xm  matrix  with  i**  row  and  3th 
column  given  by  fty . 

The  identity  matrix  of  size  nxn  is  denoted  by  In  and  denotes  am 
nxm  zero  matrix.  The  subscripts  n  and  m  for  I  and  0  are  dropped  when  it  is 
clear  from  the  context  of  the  expression  what  they  should  be. 

The  element  (covariate)  of  an  observations  y  will  be  denoted  by 
y  (i)  and  all  vectors  will  be  column  vectors,  for  example, 

fyU)l 


If  A  is  a  matrix,  then  A1  denotes  its  transpose  and  A~l  denotes  its 
inverse.  The  trace  of  a  matrix  A  will  be  denoted  by  fr(i4).  If  x  is  a  point  in  R? 

_  d 

then  ||z  ||  =  V<x,x>  is  the  Euclidean  norm  of  x  and  <x,y>  =  £*(t)i/(i). 

i*l 

If  h.  is  some  function  from  R  to  R  then  denotes  the  k01  derivative 
and  when  k=  1  or  2  we  will  simply  write  h !  and  h" . 

If  f  and  g  are  members  of  some  Hilbert  space  H,  the  inner  product  of 
f  and  g  is  written  as  </  ,9  >n  and  the  norm  is  written  as  ||/ 1|  y.  The  subscript  H 


is  dropped  when,  from  the  context  of  the  expression,  it  is  clear  where  the  inner 
product  or  norm  are  computed. 

Equation  (2.3.4)  refers  to  the  4th  numbered  equation  in  section  3  of 
chapter  2.  In  the  text  the  equation  is  referred  to  as  (2.3.4).  Theorem  (5.1.5) 
refers  to  the  5th  theorem  in  section  1  of  chapter  5  and  similarly  for  lemmas 
propositions  and  definitions. 

1.3  Previous  work  in  nonparametric  discriminant  analysis 

Consider  k  populations  . A^  and  a  d-dimensional  random  vector 

Jf=(A'(l)t...>A'(d))t .  Assume  that  the  probability  distribution  of  X  given  that  it 

comes  from  population  Aj  ,  j=l . k  is  absolutely  continuous  with  respect  to 

Lebesgue  measure  and  let  fj  (x)  denote  the  corresponding  probability  density 
function  for  jsl....,k. 

Suppose  that  a  training  sample  Xij~Xij,  ,  from  the  population  Aj 

is  available  for  each  j  =1 . k.  Given  these  training  samples  and  the  prior  pro¬ 

babilities  j- 1 k  where  0  <  qj  <  1  for  j=l k  and 

i 

we  want  to  estimate  the  posterior  probabilities 

Py(*)  =  ?>/>(*)/  £  ?</<(*)  =  P(Aj  !*=*)  ;  =  1 . k. 

<■  l 

The  estimates  of  these  posterior  probabilities  have  a  clear  application  in  Bayes 
discriminant  analysis. 

In  this  thesis  we  propose  a  class  of  optimization  methods  for  estimating 
7>i(x),...,jDk(z).  For  simplicity  of  notation  we  will  consider  the  case  where  we 
have  only  two  populations  since  the  extension  for  more  than  two  is  straight- 


forward. 

Most  of  the  work  in  discriminant  analysis  (for  continuous  variables)  is  based 
on  Normality  assumptions,  usually  with  equal  covariance  matrices.  For  a  sum¬ 
mary  of  the  work  in  discriminant  analysis  see  Lachenbruch  and  Goldstein  (1979). 
Here  we  will  only  be  concerned  with  nonparametric  discriminant  analysis. 

Fix  and  Hodges  (1951)  are,  to  our  knowledge,  the  first  to  consider  the  non¬ 
parametric  classification  problem  using  a  k-nearest  neighbor  approach.  For 
further  references  related  to  this  paper  see  Lachenbruch  and  Goldstein  (1979). 

During  the  last  10  years  there  has  been  a  development  of  classification 
rules  based  on  density  estimates.  These  kinds  of  rules  are  important  because  of 
the  extensive  research  done  in  nonparametric  density  estimation.  Another 
feature  that  makes  these  kinds  of  methods  attractive  is  a  result  by  Glick  (1972) 
that  says  that  an  estimate  of  the  non-error  rate  of  an  arbitrary  rule  based  on 
parametric  or  nonparametric  density  estimators  is.  in  some  sense  asymptoti¬ 


cally  optimal  provided  that: 


$</*(*)  “*  9 ifi(x) 


pointwise  for  almost  all  x  in  R?,  i=l . k,  and 

r  *  -  -  p 

f  £  ?»/i  -  I- 

B *  <«1 

Kernel,  maximum  penalized  likelihood  and  orthogonal  series  density  esti¬ 
mates  are  among  the  most  popular  methods.  All  these  density  estimation 
methods  involve  the  choice  of  a  parameter  that  controls  the  degree  of  smooth¬ 
ness  of  the  estimate.  Several  methods  have  been  proposed  to  choose  the 
smoothing  parameter,  among  these  there  are  three  which  are  readily  comput¬ 
able  and  objective.  Two  of  these  methods  were  suggested  by  Wahba  (1977  and 
1961b)  and  the  third  by  Habbema,  Hermans  and  Van  den  Broek  (1974).  In  this 


6 


last  paper  the  authors  estimate  the  densities  for  each  population  using  a  kernel 
estimate.  A  complete  description  of  kernel  methods  can  be  found  in  Tapia  and 
Thompson  (197B). 

The  kernel  estimate  used  in  Habbema,  Hermans  and  Van  den  Broek  (1974) 
is  erf  the  form 

(1.3.1) 

for  j=l,...,k,  where,  as  before  d  is  the  dimension  of  the  vector  z  and  k  is  the 
number  of  populations.  K  is  a  multivariate  normal  kernel  and  the  smoothing 
parameters  Oj  are  estimated  by  maximizing  what  might  be  called  the  "cross- 
validation  likelihood  function": 

AH 

where  fj  is  an  estimate  of  fj  computed  as  in  (1.3.1)  but  leaving  out  the  point 
Xji .  For  a  detailed  description  of  the  algorithm  to  carry  out  this  kernel  discrim¬ 
inant  analysis  see  Hermans  and  Habbema  (1976). 

Hermans  and  Habbema  (1975)  compare  five  methods  for  estimating  poste¬ 
rior  probabilities  using  some  medical  data  for  which  the  true  posterior  probabil¬ 
ity  function  is  unknown.  Four  of  these  five  methods  are  parametric  and  the  fifth 
one  is  the  kernel  method  described  above.  The  four  parametric  methods 
involve: 

(1)  Multinormal  distributions,  equal  covariance  matrices,  estimated  parame¬ 
ters. 

(2)  Multinormal  distributions,  equal  covariance  matrices,  Bayesian  or  predic¬ 
tive  approach. 
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(3)  Multinormal  distributions,  unequal  covariance  matrices,  estimated  parame¬ 
ters. 

(4)  Multinormal  distributions,  unequal  covariance  matrices,  Bayesian  or  predic¬ 
tive  approach. 

The  nonparametric  method  is: 

(5)  Direct  estimation  of  the  density  functions  using  a  kernel  method. 

Later,  Remme,  Habbema  and  Hermans  (i960)  carried  out  a  simulation 
study  to  compare  the  performances  of  methods  1.3  and  5  above.  Their  simula¬ 
tions  show  that  the  performance  of  the  kernel  method  was  either  be.tter  or  as 
good  as  the  performance  of  other  methods,  except  in  the  simulations  with  mul¬ 
tinormal  distributions  with  equal  covariance  matrices.  It  performed  increas¬ 
ingly  well  with  increasing  sample  sizes,  however,  the  improvement  was  very  slow 
in  samples  simulated  from  lognormal  distributions. 

Another  nonparametric  classification  method  is  given  by  Chi  and  Van  Ryzin 
(1977).  Their  procedure  is  based  upon  the  idea  of  a  histogram  density  estimator 
but  bypasses  the  direct  density  estimation  calculations. 

In  most  of  the  references  listed  above  the  approach  has  been  to  estimate 
each  density  separately  and  from  this  form  an  estimate  of  the  posterior  proba¬ 
bilities.  By  the  Neyman-Pearson  lemma,  we  know  that  if  we  want  to  classify  an 
object  as  coming  from  one  of  two  populations  with  densities  / 1  and  /  2,  we 
should  base  the  classification  on  the  likelihood  ratio  f  \/  J  2  and  hence  it  would 
be  attractive-  to  have  a  method  to  estimate  the  likelihood  ratio  directly.  Silver- 
man  (1978)  considers  the  direct  estimation  of  the  log  likelihood  ratio  for  one 
dimensional  data.  He  assumes  that  g  = Log  (/  1//2)  is  in  Cz{I),  where  I  is  some 
interval  containing  all  the  observations.  He  finds  the  conditional  Log-likelihood 
of  g  and  penalizes  it  according  to  the  smoothness  of  g  using  fig")2  as  the 


smoothing  penalty  functional.  He  estimates  g  by  maximizing  the  p  .iilized  log 
likelihood  and  shows  that  the  estimate  is  a  cubic  spline.  However,  he  does  not 
give  a  data-based  method  to  choose  the  smoothing  parameter.  In  chapter  5  we 
extend  the  result  in  Silverman  (1978)  to  the  d-dimensional  case. 

Van  Ness  (1980)  studied  the  behavior  of  the  most  commonly  used  discrim¬ 
inant  analysis  algorithms  as  the  dimension  is  varied.  He  found  that  non¬ 
par  ame  trie  Bayes  theorem  type  algorithms  perform  better  than  the  parametric 
(linear  and  quadratic)  algorithms.  He  also  found  that  the  choice  of  the  degree  of 
smoothness  must  be  done  with  great  care  or  the  performance  of  these  non- 
par  ame  trie  algorithms  can  be  very  poor. 

Anderson  and  Blair  (1962)  introduce  penalized  maximum  likelihood  esti¬ 
mates  in  the  context  of  logistic  regression  and  discrimination.  They  obtain  esti¬ 
mates  of  the  logistic  parameters  and  a  nonpar ametric  spline  estimate  of  the 
marginal  distribution  of  the  regressor  x.  See  also  Anderson  and  Senthilselvan 
(I960),  who  use  penalty  methods  for  the  hazard  function  in  one  dimension. 


CHAPTER  2 


SPLINE  ESTIMATES  OF  POSTERIOR  PROBABILITIES 


2. 1  Motivation 


Let  Y\,  ....  Yn,  n:=n1+n2  denote  the  combined  sample  from  the  two 
populations  A\  and  A%  and  define  the  random  variable 


Zi  := 


1  if  Yi  eAt 
10  if  Yt  eAz 


(2.1.1) 

Let  and  g2  be  the  prior  probabilities.  Our  objective  is  to  estimate  the  poste¬ 
rior  probabilities 


Pi  = 


Ml 


J  =  1,2. 


9l/l  +  92 f  2 

A 

Since  jD]  +  p2  =  1.  it  is  enough  to  obtain  an  estimate  p  of  p  :-Pi  and  then  the 
estimate  otp2  is  simply  Pz  =  1  —p. 

In  applications  the  prior  probabilities  are  usually  unknown,  and  hence, 
instead  of  estimating  p  we  consider  the  estimation  of 

\ 


h  =  •  , 

wifi  +  wzf  2 

where  tu  1=n1/  n  and  wg^rig/  n .  Then  if  h,  is  an  estimate  of  h. 


(2.1.2) 


-  _  (9i/^i)* 

P - - - - 

(9 1/  w\)h’  +  (92/^2)(l-^) 

is  an  estimate  of  p . 

We  can  think  of  the  vector  Z=(Z\ . Zn )*  of  zeroes  euid  ones  as  noisy 

observations  on  the  values  h{y  To  see  this,  note  that,  if  we  draw  am 

observation  Y  from  the  density  fj  with  probability  Wj,  j  =  1,2,  and  Z  is  the 


random  variable  which  is  1  or  0  according  as  j  is  1  or  2,  then 

E(Z\Y=y)  =  h.(y). 


We  will  assume  that  all  we  know  about  h  is  that  it  is  a  "smooth"  function 
such  that  0£h£l.  For  example,  in  the  one  dimensional  case  recall  that  h" 
measures  curvature  and,  hence,  we  could  estimate  h  by  minimizing: 


QxW  •=  WVi)-**)2  +  (2.1.3) 

n*»i 

subject  to 


OsMsJsi  i- 1 . 

where  Jz(h.)  =  f(h”  )2,  and  X  is  a  parameter  that  controls  the  tradeoff  between 
closeness  to  the  data,  as  measured  by  the  first  term  in  (2.1.3),  and  smoothness 
as  measured  by  Jz  The  points  slt  ...  ,sk,  should  form  a  sufficiently  fine  mesh 
so  that  a  smooth  function  that  satisfies  the  constraints  at  these  points  will 
appear  to  satisfy  them  over  a  set  S,  where  S  is  such  that 


£  t »jJi  >  e  >  0 

for  any  y  €.S,  and  some  given  e>0. 

In  section  2  of  this  chapter  we  will  define  the  class  of  functions  H 
where  the  minimization  of  (2.1.3)  occurs.  The  function  h  in  H  which  minimizes 
(2.1.3)  is  a  piecewise  cubic  spline  (see  Schoenberg.  1964). 

The  generalization  of  Jz  in  two  dimensions  is 


•w  =  /£§] 


(2)) 


a(3/(i)Va(v(2)) 


2  -i 


<*vU)dy(2) 


In  two  dimensions  the  minimizer  of  (2. 1.3)  is  called  a  thin  plate  spline  (Meinguet, 
1979)  because  of  the  analogy  to  minimizing  the  energy  of  a  thin  plate  of  infinite 
extent.  The  reader  is  referred  to  Wendelberger  (1982)  for  a  nice  physical 
interpretation  of  the  piecewise  cubic  spline  in  one  dimension  and  the  thin  plate 
spline  in  two. 
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In  more  than  two  dimensions  the  name  thin  plate  spline  does  not  seem 
adequate  and  hence  we  will  use  the  name  Laplacian  smoothing  splines,  sug¬ 
gested  by  Schoenberg  (see  Wahba,  1979b).  and  adopted  by  Wendelberger  (1982). 

We  can  also  consider  a  more  general  penalty  functional  than  J 2,  that 
is,  we  will  consider  a  penalty  functional  Jm  to  be  defined  in  the  following  section, 
which  involves  the  partial  derivatives  of  total  order  m. 

The  problem  of  estimating  h  is  only  a  particular  case  of  the  more  gen¬ 
eral  problem  of  estimating  a  smooth  function  f  when  we  have  noisy  observa¬ 
tions  on  the  values  that  /  takes  at  certain  points  in  I?4,  or  on  the  values  of  some 

linear  functionals  . applied  to  /  ,  and  we  also  know  that  /  lies  in 

some  closed  convex  subset  of  functions. 

Following  Craven  and  Wahba  (1979),  Wahba  and  Wendelberger  (1980) 
and  Wahba  (1962)  we  deal  with  this  more  general  problem  in  the  following  sec¬ 
tion. 

2.2  A  general  minimization  problem 

The  function  space  results  in  this  section  are  given  in  Meinguet  ( 1978)  and 
Duchon  (1976). 

Let  H{m,d )  be  the  vector  space  of  all  Schwartz  distributions  for  which  all 
the  partial  derivatives  in  the  distributional  sense  of  total  order  less  than  m  are 
square  integrable  over  .  This  definition  is  given  by  Meinguet  (1978).  The 
space  H(m,d)  is  called  a  generalized  Beppo  Levi  space  of  order  m  over  ff* 
Adams  (1975)  calls  this  space  a  Sobolev  space. 

Let  Ho{m  ,d )  be  the  space  of  polynomials  on  If1  of  total  degree  less  than  m. 
Then  Hoim.d)  is  an  M-dimensional  subspace  of  H(m,d),  where 


2 


and  let  Hi(m,d )  be  the  orthogonal  complement  of  Ho(m,d )  in  H(m,d),  that 


H(m,d)  =  Ho(m,d)  ©  H\[m,d) 

where  "  ©  "  indicates  direct  sum  (see  for  example  Akhiezer  and  GLazman,  1961). 

Let  Ty  -  i,  ...  ,ty\.  ti  eR*.  i=i . M.  be  an  M-unisolvent  set.  that  is.  a 

set  of  M  elements  of  R*  such  that  there  exists  a  unique  ^e//0(7n  ,d )  with 

=  Pj.  Pj  SB,  J=1 . M,  tjZTy. 

Lety=(j/(1) . y(d))*eR*,  /  :R*-»R  /  e//(m,d).  Leta=(a(l) . a(d))* 

d 

and  define  |a|  :=  ja(j).  a(j)€[0,l . m|.  Define  the  differential  operator 

im  1 


2?“  by: 


(2.2.1) 


If  2m  >  d,  the  space  H{m,d),  equipped  with  the  inner  product. 


where 


<f.9>H  -  </  .9>o+<f  .9>i 


<f  >9  >o  =  £  /  (tj  )9  Vj)  .  ery 
b-i 


</’S>1  =  m' .  o(d)!  |<Z,V 

can  be  shown  to  be  a  reproducing  kernel  Hilbert  space  (RKHS).  that  is,  a  Hilbert 

space  where  the  evaluation  functionals  are  continuous. 

Using  the  results  of  Duchon  (1976)  and  Meinguet  (1979)  it  cam  be  shown  (see 
Wahba  and  Wendelberger.  1960  and  Wendelberger,  1962),  that  the  reproducing 
kernel  of  H{m,d )  is  given  by. 


K(s ,t)=Q(s ,t)  +  F(s,f);  s.feR* 


(2.2.2) 
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where 


u  imX  Sml  (2.2.3) 

+  £  Pi(s)pj(t)Em(ti,tj) 
ij-i 


<■1 

Here Pj ,  j  =  are  the  unique  M  polynomials  in  H^m ,d)  satisfying 

*t(*i)  *  0  if  iVj 

t .  .  .  ,  tu  e  ru  and  Em(s  ,t)  s  ,t  e  R*,  is  given  by: 


(2.2.4) 


(2.2.5) 


where 


Em(s,t)  = 


‘,J|s  —  t  j|2m“tfln(||s-f  ||)  if  d  is  even 

m||s_*||2m-d  if  cf  is  odd 


r_  j^d/2-M+m 


22m~17rd/2(m  - 1)!  (m  —d/  2)! 
—  T(d/  2 — m) 


-  d  even 


d  odd. 


Let  L\ . L*,N i . jVfc,  be  continuous  linear  functionals  defined  in 

H(m,d).  Suppose  that  we  observe  Zi-l\J  +£».  i=l . n.  Where  . en  are 

independent  zero  mean  random  variables  with  variance  covariance  matrix  given 


< :=  oz(6iio?),i,j  =  1 . n 

where  a*  <“>,  t  =  1 . n  and 


jj  _  J  1  if  t=; 

"[0  if 

Here  o2  is  an  unknown  constant,  The  < 7*  are  the  relative  weights  of  the  measure¬ 
ment  errors  c% .  If  the  variances  of  c*  are  known  to  be  the  same,  then  we  set 


We  will  also  assume  that  we  know  that  the  function  f  is  In  some  closed  con¬ 


vex  set  C  that  can  be  well  approximated  by  the  set 

Cfc  =  . k 

consisting  of  a  finite  intersection  of  half-spaces. 

Our  objective  is  to  estimate  the  function  f  by  solving 


Problem  2.2.1: 

Minimize  over  H(m,d) 

“2  (*t  -LiJ  )2/  iS )  (2.2.6) 

n<«i 

subject  to 


7  =  1 . *• 

The  penalty  functional  Jm(f )  is  given  by: 


(2.2.7) 


/»</)-  2 

I  «!■»*» 


m! 


I  Wllfcp.) 


(2.2.8) 


dj!  •  •  •  ad! 

A  "constrained  Laplacian  smoothing  spline"  is  the  function  /nx  that  solves 
problem  (2.2.1). 


Clearly  the  problem  of  estimating  the  function  h  of  section  2.1  is  a  particu¬ 
lar  case  of  problem  (2.2.1)  when  L\,  .  .  .  are  evaluation  function¬ 

als. 

.  Since  Z*  and  Nj  are  continuous  linear  functionals,  by  the  Riesz  representa¬ 
tion  theorem  (theorem  Al.l)  there  exist  functions  r\\ . and  . 

in  the  space  H(m  ,d)  such  that ' 


hf  i  =  l,...,n  (2.2.9) 

7  =  t . n •  (2.2.10) 

The  functions  tj*  and  are  called  the  representers  of  Z*  and  Nj .  Any  function  f 
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in  the  Hilbert  space  H{m,d )  can  be  written  as  a  linear  combination  of 

?7i . rjn,  £j,  .  .  .  ,  pj,  .  .  .  ,py  plus  some  function  p  which  is  orthogonal 

to  eachrji.  fy.  andpj,  that  is, 


f  =  £ciVi  +  +  £  2j3t  +  p  (2.2.11) 

<■1  i-i  l«i 

for  some  vectors  of  constants  c  =(e i,  .  .  .  ,  cn)*,  6  =(6  . . 6*)*  and 

3=(3j . Sjf)1.  where 


<rjilp>  =  <^,p>  =  <Pi,p>  -  0. 
for i  =  l,. ...n;  j  =  { . k  and 

Substituting  (2.2.9).  (2.2.10)  and  (2.2.11)  in  (2.2.6)  and  (2.2.7)  it  is  easy  to 

show  that  to  solve  problem  (2.2.1)  one  should  take  p=0,  so  that,  the  solution  /nX 
can  be  written  as: 

J»\=  i  ciVi  + £ m*  +  £  sttPt 

i«l  /•  1  i-1 

By  proposition  A1.3,  Vi  end  are  given  by: 

r)i(s)=Li{t)K(s  ,f) 

tjis)=NJlt)K(s,t). 

The  subscript  (t)  indicates  that  the  functional  Z*  (  or  Nj  ),  is  to  be  applied  to 
what  follows  considered  as  a  function  of  t. 

Using  the  Kuhn  Tucker  theorem  we  will  give  a  simpler  representation  for 
the  solution  to  problem  2.2.1,  but  first  we  need  to  introduce  more  notation. 

First  rewrite  /nX  as 

/nA=.  [jt]°  +J}<2  (2.2.12) 

where  . UY-V-iVi . VkY ■  «=(c*  :  6*)*  andp=(pi . puY 


Define  the  matrices: 


tj. 


2 ]•  ^2=[^21  •  Kzz\  Sind  £ 
1  ^12>  ^22*  a^d  ^2  are 


TABLE  2.2.1 


Matrix  Dimension 


(i.j)element 


n  x  M 
LxM 


A(,)W0M)  i=l . n  j=l . n 

WMt)K(s,t)  i=l . n  j=l . L 

f*ita\Njtt\K(s  ,t)  i=l . L  j=l . L 

Ltpj  i=l . n  j=l . M 

NiVi  •  i=l . L  j=l . M 


We  will  assume  that  the  following  two  conditions  hold: 


CoodLUon  2.2.1 


Tjj . Tjn.  fi.  -  •  •  .  are  linearly  independent. 


Condition  2.2.2 

The  rank  of  the  matrix  Ux  is  M. 

For  example,  in  the  case  where  I*/  =/(sa).  i=l . n  and  Njf  =  /(*/). 

J=1 . k;  if  the  points  yx . yn,a j,  .  .  .  ,sk  are  all  distinct,  condition  2.2.1  will 

be  satisfied.  To  satisfy  condition  2.2.2  we  need  that  n&M  and  that  the  points 
Vj,  .  .  .  ,yn  uniquely  determine  an  interpolating  polynomial  of  degree  m-1. 

The  following  theorem  gives  a  simpler  representation  for  the  solution  to 
problem  2.2.1. 

2.2.1  Theorem 

If  conditions  2.2.1  and  2.2.2  hold,  the  solution  to  problem  2.2.1  is  of  the 
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/n\( 0  =  SctA(*)^m(s.O  +  £tify(9)Gm(*'t)  +  E  <W 0-  (2.2.13) 
i»l  j=l  l»l 

where  <p\ . tpy  are  the  monomial  polynomials  of  degree  less  than  m  given  by 

<Pi(t)  =  t  eR*  (2.2.14) 

Here  mW  e  (0.1 . wi-lj,  v.(U)+...+n(dL)  <  m,  Z  =  1 . M. 


Proof 


so  that 


Similarly. 


A/»x  =  LiW  ■  $*]»  +  Ap‘2 


Lj 


nA 


|*l/«X 


=  [/fn  :  K12  a  +  Uxd 


=  [A'zi :  /Taja  +  M. 


i^x/nxl 

Let  P  be  a  projection  operator  onto  the  space  Hi(m.d'),  then: 


Jmifni)  =  <PJn\'PfnK>  =  <[*7 :  fKfa  :  f]*> 

=  a*  Ad. 

Now  we  can  write  problem  2.2.1  as: 

Minimize  G(a,d)  subject  to  g  (a,d)iO,  where 


G(a,d)  =  ||[A'n:A'12]a  +  C/ia-i 


[Kn:Klz]a  +  U^-z 


(2.2.14) 


+  ^Aa‘Ab 


g{a,d)  =  [K21:Ka]a  +  U2d^r  (2.2.15) 

and  DfZ  =  diag  (1/  <7?.  .  .  .  .  1/(7*).  The  Hessian  of  the  quadratic  form  G(a,d) 


is  given  by: 


*11 

*12 

»■* 

*13 

)>I 

11 

*21 

*22 

*23 

*31 

*23 

-33 

where  *2i=*i2>  —31s— 13.  *32=*is  ®nd: 

*11  =  KnDgZK\\+n\Kn  *12  =  zA'i2+nA.A'i2 
*13  —  KnDgZU  1  *22  =  Kz\D  gZ  K  \2+n\K22 

Ha  =  KZ1D~ZU:  H33  =  E/U^i 

It  is  easy  to  see  that  if  conditions  2.2.1  and  2.2.2  hold  the  Hessian  H  will  be 
positive  definite  and  hence  the  solution  to  problem  2.2.1  exists  and  is  unique. 

Now  let  7  be  a  fcxl  vector  of  Lagrange  multipliers.  By  Kuhn-Tucker 

theorem  (Bazar&a  and  Shetty,  1979,  theorem  4.3.6),  we  have  that  if  (a ,5)  is  the 
solution  to  problem  2.2.1,  then  the  following  holds: 

VCr(a,3)  +  V0(a,2)*7  =  0  (2.2.18) 

7*0  (£.3)  =  0  (2-217) 

7fc0 

(2.2. 17)  =^0  (a,3)tyytg  (a, 5)  =  0 

and  therefore 

G(a,d)  =  C(a,2)+ j0(o,2)i77t0(a,d) 

=  ±<^2+  £/j3— *  Y  Df-  (^0+ t/,2-* )+  Ka 

+  U2d-T)tyyt{K2a.+  U^d-r) 

=  Ufa*  Ud-w)*  S  {Ka+  Ud-w)+  Z^oAh  (2.2.18) 
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_  k2 

S  '~  rf  ■ 

Also  g  (o,3)  can  be  written  as 


0(3,3)  =  [4  :  Okn][KS.+  Ud-w] 

(2.2  19) 

where  4  is  a  kxk  identity  matrix  and  Qtn  is  a  A:  xn  zero  matrix. 

and  (2.2.19)  the  Kuhn-Tucker  conditions  become: 

Using  (2.2.18) 

VaC(3,3)+Vo0(3,3)t7  =  KSKa  +  KSUdL  -  ASu> 

+nXAfa  +  9  ?  ~ 

(2.2.20) 

vdc(3,3)  +  vdg  (3,3)7  =  tf‘si/3  +  £/‘sa3 

-U*Sw  +  U*  0*7=  0 

(2.2.21) 

y  0(3,3)  =  y  [4  :  (4n][Aa+  C/S-ui]  =  0 

(2.2.22) 

and 


Now,  (2.2.20)  implies  that 


Aj,SAa+  S UcL—Svj  +7i Xa+  y  =0. 


U*  SKcl+SUcI—Siu  +  Iq0  7  =0 


And  (2.2.21)  implies 


Then,  using  (2.2.23)  and  (2.2.24)  we  get  that  —nXU*  o=0  =>Ula=0 


(2.2.23) 


(2.2.24) 


=*>U\c  +  £/|6  =  0  (2.2.25) 

and  hence 

7n*(0  *  ££*£*(, )/f(s,f)  +  £  bjNj(a)K(s ,t)  +  £3£/>t(f)  (2.2.26) 

i*l  j'*l  1*1 

Substituting  (2.2.2),  (2.2.3)  and  (2.2.4)  in  (2.2.26)  and  grouping  we  obtain: 
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7nx(0  =  £?i4(s)5»(*.0+  iSjNjMEnis.t) 
**1  i*  1 

+  £&-<Op*(0  -  2  (£/*,£+ £/|6)tfm(f.f|) 

i-i  i-i 

+  2  2(^+£^6)£'m(«i.«i)A(0 

i-U-1 

where  dj’is  given  by 


<V=  2«i4^(*i.«)  =  i*JNmEm(tl.s) 

t*l  fml 

n  _  fc  „ 

+  2c<A(«)pi(s)+  2  bjNj(*)Pt(s)- 

*■1  y»i 

Then  by  (2.2. 25)  we  have  that 


(2.2.27) 


7»aU)=  2c<4(,)£W(s.0+ 

i»l  j»l  1*1 

where  3t=d(— dj*  Let  $pj,  .  .  .  , qpjf  be  as  in  (2.2.14).  Then  [<Pi\iLl  form  a  basis 

U 

for  B0(m,d).  then,  since  Pi  =  2  ^IjVj  for  some  1 hjj*i  30  that  we  can  substi- 

y-i 

tute  the  basis  [pt j/Lj  by  the  numerically  more  convenient  basis  j<p£  j^j.  That  is, 
we  can  write: 


U  M 

2  ^iPi  =  2^i 

t«i  i-i 

for  some  (d1(  ....  dy)1  eR*.  Therefore,  the  solution  to  problem  2.2.1  can  be 
written  in  the  form  (2.2. 13). 


Now  let  us  define  the  matrix 


where  T £  and  Tg  are  given  in  Table  2.2.2. 


(2.2.28) 


Upon  replacing  the  matrices  UltUz  and  U  by  7*i <TZ  and  T  respectively, 
expressions  (2.2.16)  through  (2.2.25)  still  hold  with  d  instead  of  d  and  the  condi- 
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TABLE  2.2.2 

Matrix 

Dimension 

(i.j)element 

E 12 

Ezz 

Ti 

Tz 

n  x  n 
n  x  L 

L  x  L 
nx  M 
LxM 

Li(s)Eut)Em(s ,t)  i=l . n  j=l . n 

Li(.s)^i{t)Em(s ,t)  i=l . n  j=l . L 

Ni(s)N][t)Em(,s  >0  . L  j  =  l L 

Li<Pj  i=l . n  j=l . M 

Ni<Pj  i=l . L  j=l . M 

tion  Ula  =  0  is  replaced  by  the  condition  Tl  a  =  0.  Then  we  can  rewrite  prob¬ 
lem  2.2.1  as: 


Problem  2.2.2 

Minimize  G(a.,c£)  subject  tog(a,d)^0  and  T*a=0,  where  now, 

G(a,d)=  ^{E^a  +  r,d  — 2  )*  DgZ  +  T^d  — z  ) +  Ea  (2.2.29) 

g(a,d)  =  E&i  +  T&i-r  (2.2.30) 

where: 

E\\  Eiz 

Ezi  Ezz  ' 

Ei  ~  n  :  E\z\-  E2  ~  [EZ\  :  £22]  and  E21=E\2,  En,  E 12  and  E&  are  given  in 
Table  2.2.2. 


Let  N=n  +k ,  since  the  rank  of  T  is  M,  there  exists  an  N X/V  matrix 
Q  =  [<?j  :  Qz\  such  that 


Q\ 

Qi 


=  !?! 


(2.2.32) 


where  R  is  an  MxM  upper  triangular  matrix,  0  is  an  N-MxM  zero  matrix  and 
Qi  and  Q2  are  NxM  and  NxN—M  matrices.  This  is  known  as  the  Q-R  decompo¬ 


sition  of  T  (see  Dongarra,  Bunch,  Moler  and  Stewart,  1979). 


Let  e  be  the  N—M  dimensional  vector  such  that  a  =  Qze ,  then,  by  (2  2.32) 
we  have  0  =  Ti  Q2e  =  Tla  so  that  instead  of  solving  problem  (2.2.2)  we  solve 


the  quadratic  programming  problem: 


Problem  2.2.3 

Minimize  G*(e  ,d)  subject  to  g  *(e  ,d)  i  0.  where: 

G*(e  ,d)=  ^(Ex  Q^e  +  Txd  -z )‘  D  ~z  (Ex  Qze  +  Txd  -z  )+nXe 4  QzEQ&  (2.2.33) 


g'(e,d)  =  igGa®  +  ^d- 


(2.2.34) 


Then,  if  (e,  d)  solve  Problem  2.2.3,  (a,d)  solve  Problem  2.2.2,  where  o=<?2e 

For  a  given  value  of  X  we  can  solve  problem  2.2.3  using  any  quadratic  pro¬ 
gramming  routine.  So  far,  nothing  has  been  said  about  the  choice  of  the 
smoothing  parameter.  In  the  following  section  we  describe  the  method  that  we 
will  use  to  choose  a  "good”  value  of  X  from  the  data. 


1.4  The  choice  of  the  smoothing  parameter 


In  real  life  problems  the  correct  value  of  the  smoothing  parameter  X  is  not 
known.  Wahba  and  Wold  (1975),  Craven  and  Wahba  (1979)  and  Golub  Heath  and 
Wahba  (1979)  have  suggested  the  use  of  generalized  cross-validation  to  estimate 
X  from  the  data  in  the  unconstrained  case.  In  the  presence  of  linear  con¬ 
straints,  Wahba  (1960)  and  (1962)  suggested  the  use  of  generalized  cross- 
validation  for  constrained  problems. 

Before  describing  the  method  of  generalized  cross-validation  for  con¬ 
strained  problems,  which  we  will  refer  to  as  GCVC,  we  give  a  brief  review  of  the 
method  of  generalized  cross-validation  for  unconstrained  problems  which  will  be 
referred  to  as  GCV. 

Let  /-x  be  tbe  mini mizer  of 


23 


£  (A/-*i)2  +  ^m(/)  (2.3  1) 

71  i  ■!.*#* 

Al] 

If  X  is  a  good  choice,  then,  on  the  average,  Lqfn\  —  z?  should  be  small  and  this  is 
reflected  in  the  ordinary  cross-validation  function  V0  (X)  given  by: 

K(X)  =  (Z,/nX-*?)2-  (2  3  2) 

n?«i 

Craven  and  Wahba  (1979)  and  Golub,  Heath  and  Wahba  (1979)  showed  that 


y  s  1  [Ljf  nX"*^] 

°(  5  n£x  (1  -o*(X))* 


(2.3.3) 


where  /nX  is  the  mimmizer  of 


Q\(f  )  =  U*/  ~zt)2/  ^i2  +  X-/m  (/  )  (2.3.4) 

71  i  *1 

and  is  the  (i.i)  entry  of  the  nxn  matrix  A (X)  satisfying 

^\1  nX 

=  d(X)z.  (2.3.5) 

»X 

The  minimization  of  (2.3.2)  with  respect  to  X  requires  the  solution  of  a 
linear  system  of  order  n+M-1,  n  times  for  each  different  value  of  X,  whereas  the 
minimization  of  (2.3.3)  requires  the  solution  of  one  linear  system  of  size  n-t- M  to 


And  and  then  one  of  size  n  to  And  a*  (X)  for  each  value  of  X. 


Craven  and  Wahba  (1979)  and  Golub,  Heath  and  Wahba  (1979)  show  that 
from  the  point  of  view  of  minimizing  the  predictive  mean  square  error  given  by 


T(X)  =  i-tUifnX-Lif)2.  (2.3.6) 

ni-l 

V0  (X)  should  be  replaced  by  the  generalized  cross-validation  function  V(X)  given 
by: 
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rn  _  1  A  (4/nx-e^^n  _  ir^W).W 
V(X)  -  —2  -■■  ^/(x)  -  -7- - nr 


*»«i  (l-o^X))2 


(/— 4(X))  I 


(Z3.7) 


where 


«*<*)  =  ^3-M— . 

They  show  that  the  minimizer  of  (2.3.7)  estimates  the  minimizer  of  (2.3.6). 

Wendeiberger  (1961)  developed  an  efficient  algorithm  to  compute  the 
minimi  gar  of  (2.3.4)  estimating  X  by  minimizing  (2.3.6). 


Now  let  C  be  any  closed  and  convex  set  in  H(m,d),  fnK  be  the  minimizer 

«[*) 

of  Q\(J )  in  C  where  Q\  is  given  in  (2.3.4)  and  let  fn\  be  the  minimizer  in  C  of 

ZT  2  (4/-*i)2  +  M«(/)-  (2.3:10) 

ni*l7i+q 

The  ordinary  cross-validation  function  V0  (X)  is  given  by 

n(*)  =  M:  (4,/nx-*,)2- 

It  is  obvious  that  Vg  (X)  would  be  prohibitive  to  compute  in  most  cases. 

Wahba  (1960)  shows  that  given  the  data 

r  -w  |< 

[*  1*  •  •  •  •  nX  •  •  •  >  zn  j 

-fa) 

the  minimizer  of  )  in  C  is  fn\ ,  that  is. 

[{1 

/nxt*  +*q  ]  =  7nX  f>  ]  (2‘3- 1 15 

The  notation  fn\[z+6q]  indicates  that  fn\  is  the  minimizer  in  C  of  Q\(f ) 
based  on  the  data  vector  z  +6q .  where  <57  is  given  by: 

Ail 

=(0 . 4,/nX  [*]-», . 0)*. 

Ail  , 

and  fnx  [z  ]  is  the  minimizer  in  C  of  (2.3. 10)  based  on  the  data  vector  z 
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Using  (2.3.11),  Wahba  (1981a)  shows  that  ordinary  cross-validation  can  be 
written  as 


where 


(4/nA-*, 


>2 


i  (l-a^(X))2 


(2.3.12) 


<,(X)  = 

is  what  Wahba  calls  the  "differential  influence"  of  z,  when  X  is  used 

The  GCVC  function  is  obtained  by  replacing  a ^  in  (2.3.12)  by  the  average 
"differential  influence”,  so  that  the  GCVC  estimate  of  X  is  obtained  by  minimizing 

r-Ifi,/,-*,)2 

V^X)  =  — - -  (2.3  13) 

(1-^-S  <*W 

n*»  1 

As  we  mentioned  in  section  2.2  we  will  assume  that  the  convex  set  C  can  be 
well  approximated  by  the  intersection  of  a  finite  number  of  half  spaces: 


f7nx[2+<5vJ“A;/nx[2J 

1151 

LqJnX  [*]-*? 


Ck  -  f  :  Ni/rSVi,  i-l . k  .  (2.3.14) 

Then,  to  evaluate  V^(X)  for  a  single. value  of  X  we  need  to  solve  n  quadratic  pro¬ 
gramming  problems  in  n+k—M  variables.  To  avoid  this,  Wahba  (1981)  sug¬ 
gested  using  the  approximate  generalized  cross-validation  function  given  by: 

l&pM  =  n -  (2.3.15) 

<i~E 

71 7-1 

where 


CHAPTER  3 
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THE  ALGORITHM 


3.  l  Introduction. 

The  software  written  as  part  of  this  thesis  was  developed  for  the  case  where 

the  functionals  L\ . Z*  and  Nx,  .  ...  Nk  are  evaluation  functionals.  That  is, 

we  want  to  minimize 

+  *•/«(/) 

n  i-i 

subject  to  f  (Sj  ,  for  i  =  t . k .  However,  we  present  the  algorithm  in  its 

more  general  form,  where  the  Z*  and  Nj  are  any  continuous  linear  functionals. 

Before  solving  problem  (2.2.1)  we  solve  the  unconstrained  problem  estimat¬ 
ing  the  value  of  A  by  generalized  cross-validation.  The  software  to  solve  the 
unconstrained  problem  in  the  case  where  the  Z*  are  evaluation  functionals,  was 
developed  originally  by  Wendelberger  (1961)  and  can  be  obtained  from  Madison 
Academic  Computing  Center  (1961)  or  from  IMSL  (1963). 

If  the  solution  for  the  unconstrained  problem  satisfl.es  all  the  constraints, 
then  that  is  also  the  solution  to  problem  2.2.1. 

If  this  is  not  the  case,  we  use  the  value  of  A  obtained  from  the  solution  of 

the  unconstrained  problem,  say  A0  as  a  starting  guess  for  the  ’’correct”  A  for 
problem  2.2.1.  In  fact,  since  the  imposition  of  constraints  is  in  some  sense  a 
kind  of  smoothing,  it  is  natural  to  expect  that  an  optimal  A  for  the  constrained 

problem  will  be  smaller  than  Also,  intuitively,  the  optimal  A  for  the  con¬ 
strained  problem,  say  A  should  not  be  "too  far  away”  from  A0 .. 
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to 


Following  section  2.2  it  is  easy  to  see  that  problem  2.2.3  is  equivalent 


Problem  3.2.1 

Minimize 

G*(a9,d)  =  +  rfd-za]‘  (fffa,  +Tfd  -  «,]  +  nXa^a, 

subject  to 

g°(a9,d)  =  E$a.9+Ttfi-T^0  and  T** a9  -  0. 

Let  Q  =[<?i  :  and  R  be  the  Q-R  decomposition  of  V,  that  is. 

&|  “  [oh  JLxJ-  (a“> 

Let  e,  be  the  n  +k  —M  dimensional  vector  such  that  a a-Qz«a-  Then,  instead  of 
solving  problem  3.2.1  we  solve  the  equivalent 


Problem  3.2.2 

Minimize 

=  j^ErQze'+Ttd-Z'}*  (£T<?2eff+rfd-za)  +n\etaQt,E°Q2e , 

subject  to 


§'(e  a,d)  =  EgQ  #  .+  T^d&r.  (3.2.3) 

Then,  if  (ea,d)  solve  problem  3.2.2,  (a^.d)  solve  problem  3.2.1.  where 

=  Q#9 

and 


is  a  solution  to  problem  2.2.3. 
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Now. 


ti 

r 

][d]  ~  *• 

[t 

+  nAjs 


t  4l[  &FQz  G(n +*-*)(*) 
*  '  Jl0(Jf)(n+fc-lf)  Ouu 
L 


-  2, 


(3.2.4) 


(3.2.5) 


where  H  is  the  Hessian  and  is  given  by: 

(&EfE?Q2  +  n\Qt,E°Qz  Q^Tf 

T?  E*xQz  Tf  T° 

Finally,  from  (3.2.3)  we  write  the  term  £s(efft(i)  defining  the  linear  constraints 

as: 

=  [eZQz  :  r2]|edB[ 

3.3  The  quadratic  programming  algorithm 


Let  /nA(f).  given  by 

7nx(0  =  IlBiLi(a)Em(s,t)  +  E  bj-Nj (s)Em (s.f)  +  0  (3.3.1) 

»»i  i«i  t*i 

be  the  solution  to  problem  2.2.1  for  a  given  value  of  A.  Suppose  that  there  are  l 
active  constraints  at  the  solution  that  correspond  to  Nv( j),  ....  Nv(i).  where 

T:=bU) . v(0icll . k\.  (3.3.2) 

If  we  solve  problem  (2.2.1)  for  some  other  value  of  A.  say  A',  then  we  will  get 
a  possibly  different  set  T'  =  \v'  (1),  .  .  .  ,  v'  (i)l  corresponding  to  the  active  con¬ 
straints  at  the  solution.  If  A  and  A'  are  relatively  close,  it  is  likely  that  the  sets  T 
and  T*  will  either  be  the  same  or  at  least  they  will  not  be  "too  different". 


It  is  this  feature  of  our  problem  that  motivated  the  use  of  an  "active  set" 
algorithm  to  solve  the  quadratic  programming  problem.  The  algorithm  that  we 
used  was  developed  by  Gill,  Gould.  Murray,  Saunders  and  Wright  (1982).  The  idea 
behind  this  algorithm  is  as  follows. 

If  the  correct  active  set  of  constraints  were  known  a  priori,  then  the  solu¬ 
tion  to  problem  2.2.1  would  be  the  solution  to  a  problem  with  equality  con¬ 
straints.  There  are  several  efficient  algorithms  for  solving  problems  with  equal¬ 
ity  constraints  and,  in  fact,  the  presence  of  equality  constraints  actually 
reduces  the  dimensionality  in  which  the  optimization  occurs.  Therefore,  it  is 
desirable  to  apply  techniques  from  the  equality  constrained  case  to  solve  prob¬ 
lem  2.2.1.  To  do  this,  a  subset  of  the  constraints  of  the  original  problem,  called  a 
"working  set"  of  constraints,  is  selected  to  be  treated  as  equality  constraints. 
Obviously,  the  ideal  candidate  for  the  working  set  would  be  the  correct  active 
set.  Since  the  correct  active  set  is  not  available,  the  method  includes  pro¬ 
cedures  for  testing  whether  the  current  working  set  is  the  correct  one,  and 
altering  it,  adding  or  deleting  constraints,  if  not. 

In  our  problem,  every  time  we  solve  the  quadratic  programming  problem 
2.2.4  for  a  given  value  of  X,  say  X',  we  obtain  a  correct  active  set  for  that  particu¬ 
lar  X.  By  the  argument  at  the  beginning  of  this  section,  this  correct  active  set 
will  be  a  good  starting  guess  for  the  correct  active  set  for  some  other  value  of  X 
close  to  X'  and  therefore  once  we  solved  the  problem  for  the  first  time  we  can 
expect  very  fast  convergence  with  this  active  set  algorithm.  In  fact,  this  was 
what  we  observed  in  our  Monte  Carlo  studies.  In  the  cases  where  the  active  set 
did  not  change  from  one  value  of  lambda  to  the  following,  the  quadratic  pro¬ 
gramming  routine  converged  in  one  iteration.  We  will  discuss  this  further  in 
chapter  4  where  we  will  present  the  results  of  the  Monte  Carlo  study. 
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After  the  quadratic  programming  problem  has  been  solved,  the  numerator 

of  (3.4.2)  can  be  computed  easily.  To  find  the  denominator  of  V^p(A)  we  must 

n 

compute  2  which,  by  (3.4.1)  is  simply  the  trace  of  A  (A). 

4*1 

In  Theorem  3.4.1,  we  will  give  an  expression  for  A(A).  Before  stating  this 
result,  we  must  introduce  some  more  notation.  Let  the  matrices  E&CT),  2f&(T) 
and  T2(T)  consist  of  the  rows  and  columns  of  the  matrices  £’22>  E\2  and  Tz, 
corresponding  to  the  active  set  of  constraints,  that  is,  E^ziT)  consists  of  rows 

and  columns  t;(l) . v(l)  of  E&,  EfeiT)  consists  of  columns 

'u(l),  .  .  .  ,t/(£)  of  E° g,  and  T^T)  consists  of  rows  v(l),  .  .  -  ,v(l )  of  Tz.  Also 
define  the  matrices: 


and 


^(T)  = 


f? 

U  T) 


£'2ffi(T)  =  £T2(T)‘, 


E?(T)  =  ffff,  :  £f*(T)],  EH T)  =  [ff&m  :  £»(T)] 


(3.3.4) 


E*(T)  = 


E  f,  Eh(  T) 


p&(T)  EzA T)] 

Where  E\\  and  Tf  are  as  defined  in  (3.2.1).  Here  we  use  the  notation  (T)  to 
emphasize  the  dependence  on  the  set  T  of  active  constraints. 

Now  we  can  rewrite  problem  3.4. 1  as: 


Problem  3.4.2 
Minimize 

C(a,.d)  =  ^f(T)a,+  7Td-.,)‘(^f(T)a+rr-«.) 


(3-4.5) 


subject  to 
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g(a9,d)  =  ^|(T)aff+r2(T)-r(T)  =  0  (3.4.6) 

where  r(T)  =  (rv(1) . rv{i)Y 

Let  ^>i(T),  ^a(T)  and  R( T)  form  the  Q~R  decomposition  of  T°(T),  that  is, 
they  satisfy. 


Gi(T)  Tact's  -  tff(T)l 

^(DJTO-L  bi- 

Here  /?(T)  is  MxM ,  0  is  (n+L—m)xM ,  <?j(T)  is  i/x(n+f)  and  £2(T)  is 
(n+l— Af)x(n+i).  In  the  proof  of  theorem  3.4.1  we  will  assume  that 
E0(T) <?g(T)  is  positive  definite.  This  will  hold  if  conditions  2.2.1  and  2.2.2 
hold  (see  Dyn  and  Wahba,  1962). 


3.4.1  Theorem 


Let  /nX  be  the  solution  to  problem  3.4.1,  then  there  exists  a  matrix  4(A) 
such  that 

Lihx 

4 nf  n\ 

and  it  is  given  by: 

A(\)  =  /n-nA^J^^^CD+nXfK)^!)]'1^) 
where  G2CO  is  obtained  from  the  Q-R  decomposition  of  r°(T),  In  is  the  nxn 
identity  matrix  and  W  is  given  by. 

4  Qu 

Ws[ob,  ' Ou  ■ 

Proof 


=  A(\) 


(3.4.5) 


(3.4.6)  =*>fi?  (a,,  d)]*f3  (□.,,(£)]  =  0  and  therefore  we  can  write  (3.4.5)  as: 
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S(a v,d)  =  C(aff,d)  +  (aa,dY g (a^d.) 

=  i<Sf  (T)aff+  Tfd  -z.Y  (£f  (T)a.+  T?d-za) 

+  j(Eg  (T)a„+  Tz(y)d-r  (T  '))*  (££  (T)aff  +  7a<T)d  -r  (T)) 

+  ftXfl^"(T)aff  (3.4.8) 

=  ■k^(T)a<r+  7"(T )d-w)t  (E°(  T)a„+  7’tf(T)d-u/)+n  Xa^(T)aff 

c 

where  in  =  (z£  :  r(T)*)*.  Now,  write  £(aff,d)  as: 

[cu  r/jl^m+T’md  — i luj.  (3.4.9) 

Then  if  we  let  7  =  (7j . 7iV  be  a  vector  of  Lagrange_ multipliers,  problem 

3.4. 1  reduces  to  minimizing 

C(aff,d)  +  7*  [0»  :  /|][£*(T)o,+  7,'(T)d-M»].  (3.4.10) 

Using  (3.4.0)  and  differentiating  (3.4.10)  with  respect  to  a9,  d  and  7.  and  equat¬ 
ing  to  zero  we  obtain: 

r(T)^(T)aff+r(T)HT)d-r(T)w+nAr(T)o,+P(T)  y=0  (3.4.11) 

T{ T)‘  7v(T)d+7ff(T)£,ff(T)o-rff(T)u/+r*(T)<  7  =  0  (3.4.12) 

and 

[q*  :  /l][.£w(T)a  +  7’"(T)d-tu  =0.  (3.4.13) 

Then.  (3.4.11)  implies  that 

E*( T)  (E°{T)+n\In+l)a„+Ta(T)d-w  y  =0.  (3.4  14) 

Equation  (3.3.12)  implies  that 

H(T)<<H(T)d+£"(T)a,-™  +  °*  7  =0.  (3.4.15) 

Then  (3.4.13)  and  (3.4.15)  imply  that 


-nXT(’T)ta.a  =  0  =>  r®(T)aff  =  0 


and  (3.4.13)  and  (3.4.14)  imply 


(3.4.16) 


[<9bi  :  h\  ^  y+n\InHa0  =  0 


►  1-hy-  -n-XJQ*  :  /ija» 
y  ~  h  :  ^](  n^)aa 

Oil 

=  Otn  -TLXIi  a°- 


(3.4.17) 


Now.  using  (3.4.17)  and  (3.4.14)  we  get 


[^(T)+nX/n+(|a,=  -HT)<i-  y+v 

®nn  Qu 

=  -r(T)d-  ^  _^A/j  aa+u; 


=>  [fi,,(T)+nXy)atf=  -T*{T)d+w.  (3.4.18) 

Let  <?i(T),  <J2(T)  and  /?(T)  be  given  by  the  Q-R  decomposition  of  7^(T).  Let 
e  be  the  n  +Z  —M  dimensional  vector  such  that 

a*=<?2(T)e  (3.4.19) 

Then  r*(T) Qz(Xie  -  r*(T)off  =  0,  so  that  (3.4.18)  is  satisfied.  Premultiply¬ 
ing  (3.4.18)  by  <?2(T)*  we  obtain: 

9aW‘(^*(T)+nMr)a,  =  -<?2(T)‘  r(T)d  +  <?2(T)*u; 

=  Qz(.ryw 

=>  <?2(T)'(£,,r(T)+nXFI')<?2(T)e  =  <?2( T)u» 


and  by  (3.4.19) 


=*►  s  =  [<?2(T)  (£*(T)  +71  x^)^2(T)]_1^2(T)^ 

) 

a.  =  ^[^(^‘(^(D+nX^)^)]'1^!)^.  (3.4.20) 


Finally,  by  definition  of  4(X)  we  have 


3.4. 1  Proposition 


The  trace  of  the  matrix  I— A  (X)  is  given  by: 


-rruSfcr- 


(3.4.23) 


where  px . PnH-U  &re  the  eigenvalues  of  the  real  symmetric  generalized 


eigenvalue  problem: 

ATt  i  =  l,  .  .  .  ,n+L-M, 

where  Tj,  .  .  .  ,Tn*i-U  ®i*8  the  corresponding  eigenvectors. 


(3.4.24) 


Proof 


tr{I-A{  X)) 


))  =  nXfr  ^(/-MiXS-^)]-1 
=  n\tr  A(/+nX<^'1A)-1$-1 
=  nAfr  $~1A(/+»A#”1A)“1  . 


=  »A/rj$-1A(/+»X#-1A)-1j. 

Now,  let  pj . >PnH-M  be  the  eigenvalues  of  the  generalized  eigenvalue  prob¬ 
lem  (3.4.24),  then,  $“1Ari=piri,  i  =  l . n+L-M  and  hence 

Pi,  ■  ■  ■  •  Pn+i-U  are  the  eigenvalues  of  the  matrix  then  if  UDU~l  is  the 
eigenvalue  eigenvector  decomposition  of  $-1A,  with 
D  =  diag(p1 . Pn+i-u),  we  ^ve: 


tr(I-A(\)) 


=  nXfrji 


UDU~l(U[I+nXn]U 


=  nXfr(Z?[/+nXD]-1) 

r+t  -H  pj 

Si  i+^«VPi  ' 


Therefore,  the  denominator  of  v£-  (X)  is  given  by 
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[£*r(/-i«A))P  =  2  7T~-  • 

71  j  X  •  71  Ap^ 


(3.4.25) 


Using  (3.4.23)  to  compute  tr(I—  yl(A))  has  the  advantage  that  we.  only  need 
to  solve  the  generalized  eigenvalue  problem  when  the  set  of  active  constraints 
changes  from  one  value  of  A  to  the  next. 

In  the  following  section  we  present  the  step  by  step  algorithm  to  compute 
the  solution  to  problem  2.2.1  choosing  the  smoothing  parameter  A  by  general¬ 
ized  cross-validation  for  constrained  problems. 


3.5  Hie  step  by  step  algorithm 


After  the  unconstrained  problem  has  been  solved  we  have  an  estimate  of  A, 

say  Ag .  The  algorithm  to  compute  the  constrained  spline  uses  this  value  of  A  as  a 

starting  point  to  get  the  estimate  A  that  minimizes  the  approximate  GCVC  func- 

tion.  If  A„  =  <*  the  algorithm  also  requires  the  largest  eigenvalue  of  the  matrix 
<?2*  E11Q2  where  Qz  is  obtained  from  the  Q-R  decomposition  of  T° : 

qI  t°  =  0- 

This  eigenvalue,  call  itp*  is  available  from  the  routine  that  computes  the  uncon¬ 
strained  spline  using  Wendelberger's  (1961)  algorithm  (see  Madison  Academic 
Computing  Center,  1981). 

The  step  by  step  algorithm  is  as  follows: 


(1)  Compute  M  =  [m+^_1 


(2)  Compute  t3. 


(3)  Compute  T°  given  by  (3.2.1). 


(4)  Obtain  Q-R  decomposition  of  V 


r  =  [f 

(5)  Compute  EQ  given  by  (3.2. 1) 

(6)  Compute  the  Hessian 


Qz 


71XS2 

P(n+l-li)(U) 


0(U){n+l-U) 

Ouu 


First  compute 


QiEfEfQz  <&EfT f 
T*E\\Qz  TfTf 


then  compute 


=  <&E°Qz 

(7)  Compute  Matrix  defining  linear  constraints: 


QPCONS  -  [EiQz  :  Tz] 

(8)  Compute  matrix  defining  linear  term  in  the  quadratic  form  Ga(e„,d) 

QPLIMA  -  \e?Q2  :  Tf] 

(9)  Compute  linear  term  in  G9(e9,d ) 

QPLIVE  -  ^[ExQz  ■  Tf 

(10)  Construct  regular  grid  of  values  of  X  around  Xg  in  logarithmic  units,  in 
increments  of  0. 1  (see  details  at  the  end  of  the  step  by  step  algorithm). 


(11)  For  each  value  of  X  do  the  following: 

(11.1)  Solve  quadratic  programming  problem  to  obtain  e9  and  d*  using  the 
set  of  active  constraints  for  the  solution  for  the  previous  value  of  X  as 
initial  guess. 


i 

r 


E 


I 


f 


£' 

i; 

0 
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(11.2)  Compute  residual  sum  of  squares 

(11.3)  Compute  denominator  of  approximate  GCVC. 

(11.3.1)  If  the  set  of  active  constraints  for  current  value  of  X  is  the  same 
as  for  previous  value  of  X  go  to  step  (11.3.6),  otherwise  continue 
to  step  (11.3.2). 

(11.3.2)  Compute  Q-R  decomposition  of  r®(T) 

fl’ffijnT)  =  [T] 

(11.3.3)  Get  matrix  $  given  by  (3.4.22). 

(11.3.4)  Compute  &  given  by  (3.4.22) 

(11.3.5)  Solve  the  generalized  eigenvalue  problem  (3.4.24)  to  obtain 
Pi.  •  •  •  •  Pn+t-lf  ■ 

(11.3.6)  Compute  denominator  of  K£p(X)  given  by  (3.4.25). 

(11.4)  Compute  V^,  (X)  and  save. 

(11.5)  If  V^j,(X)  is  smaller  than  (previous X)  then  e„  =  e  *  and  d  =  d* 
(11.8)  Next  X 

(12)  Compute  cff  and  6 


= 


(13)  Get  coefficients  of  the  spline  r  b  and  d: 


c 

b 

— 

b 

d 

d 
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In  step  (10)  of  the  algorithm,  the  number  of  values  of  X  for  which  we  solve  the 
quadratic  programming  problem  and  evaluate  V^,(X)  is  given  as  an  input 
parameter  (see  documentation  of  routine  DSCOMP  in  appendix  A2).  The  user 

can  specify  the  number  of  values  to  the  left  (74)  and  to  the  right  (74.)  of  Xg .  We 
recommend  that  ti{  be  greater  than  74.  since  in  all  our  simulation  studies  the 

"minimizer”  of  V^p(X)  was  to  the  left  of  X0 . 

Most  of  our  simulations  were  done  with  74  =  15  and  74.  =  10.  One  should  be 
careful  in  choosing  74  and  tv  because  if  the  total  number  of  values  of  X  con¬ 
sidered  is  too  large  the  computation  of  the  spline  could  be  very  expensive.  As  a 
rule  of  thumb,  and  based  only  on  our  simulation  study,  we  would  suggest  consid¬ 
ering  between  15  to  20  values  to  the  left  and  between  6  and  10  to  the  right. 

The  grid  of  values  of  X  is  constructed  as  follows  (in  units  of  logarithm  of  X): 

A 

If  A,  <»  the  grid  is  constructed  in  equally  spaced  intervals  of  size  0.1,  that 
is,  the  grid  consists  of  the  following  values:  Log  (A„ )— 0. 174,  .... 
toy  (Xo  ) -0. 1  ,log(A0 )  ,log(A0 ) +0. 1 . log(Xg ) +0.  Itv  • 

If  Xg  =00  then  we  use  the  sample  size  n  and  the  largest  eigenvalue  p*  from 
the  unconstrained  problem  to  determine  an  upper  bound  for  the  values  of  X  that 
will  be  considered.  This  upper  bound,  call  it  X*  is  computed  as 

X*  =  10^(ti +fc  )p  * 

and  the  values  of  X  considered  are  from  largest  to  smallest:  Log  (X*), 
log(XV2.0,  toy(X*)— 3.0,  log(X’)-4.0,  (Log  (X')-4.0)-0.1, 

(toy (X*) -4.0) -0.2,  ....  (toy(X*)-4.0)-0. 1(74-3). 

In  step  (11.1)  we  use  the  routine  QPFC  to  solve  the  quadratic  programming 


problem. 


In  step  (11.3.5)  we  use  the  EISPACK  routines  REDUC,  TREDl  and  TQLRAT  (see 
Boyle,  Dongarra.  Garbow  and  Moler,  1977),  to  solve  the  generalized  eigenvalue 
problem. 

The  routines  DSCOMP  and  DSEVAL  to  compute  and  evaluate  the  spline  are 
written  In  Ratfor  in  a  VAX  11/750  under  UNIX  operating  system.  Ratfor  is  a 
preprocessor  which  translates  this  language  into  portable  Fortran.  Both,  the 
Ratfor  routines  and  the  Fortran  routines  are  available  from  this  author. 

All  the  computations  are  done  in  double  precision,  and  the  routines  are 
self-documented.  In  appendix  A2  we  list  the  ratfor  source  for  routines  DSCOMP 
and  DSEVAL  Routine  DSCOMP  is  the  routine  that  the  user  should  call  to  solve 
problem  (2.2.1).  We  also  give  the  listing  of  all  the  routines  used  by  DSCOMP  and 
DSEVAL  except  the  routines  to  solve  the  quadratic  programming  problem.  Rou¬ 
tine  DSEVAL  evaluates  the  spline  computed  by  DSCOMP  at  a  set  of  points  in  R* . 
The  calling  sequence  for  DSCOMP  and  DSEVAL  as  well  as  explanation  of  the  vari¬ 
ables  that  appear  in  the  calling  sequence  are  listed  as  comments  in  the  source 
code. 


As  we  mentioned  before,  the  algorithm  is  written  for  the  case  where 

Li . !>„,  and  Nit  .  .  .  ,  Wfc  are  evaluation  functionals,  for  example, 

Lif  =  /(y»).  i  =  l . n  and  Njf  -  /  (sy),  j  =  l . k. .  It  is  assumed  that 

the  n+k  points  are  different  so  that  the  generalized  eigenvalue  problem  in  step 
11.3.5  can  be  solved.  In  the  near  future  we  plan  to  incorporate  the  handling  of 
replicates  in  the  algorithm.  One  possible  strategy  to  handle  replicates  is  the  fol¬ 
lowing:  suppose  that  we  have  7i*  replicates  at  the  point  yt,  and  denote  them  as 
Zi( i),  ....  zi(ni)’  then  take  the  average 


E*iU) 


g.  =  J'*1 

and  let  af  =  1/n*.  Then  use  (^.y*),  i=l . ti  as  the  data  with  relative 


weights  (erf,  .  .  .  ,<7^).  The  grid  of  points  Sj . sk  can  always  be  chosen  so 

that Sj ,  i  =  l . n  ;  =  1 . k. 

In  the  example  with  real  data  in  section  4.2  the  replicates  were  handled 
using  the  strategy  mentioned  above. 


CHAPTER  4 


MONTE  CARLO  EXPERIMENTS  AND  EXAMPLES 

4. 1  Comparison  of  linear,  quadratic  and  spline  discrimination 

In  this  section  we  compare  the  performance  of  the  two  parametric  models 
most  commonly  used  with  the  spline  model. 

Design  at  the  simulation  study 

Our  3tudy  is  restricted  to  continuous  bivariate  data  and  to  discrimination 
between  two  populations  Aj  and  Az.  The  prior  probabilities  are  taken  to  be 
equal  and  so  are  the  sample  sizes  for  the  training  samples. 

As  in  chapter  2,  let  Yv  ...  ,Yn,  n=7ij+n.2,  denote  the  combined  sample 
from  the  two  populations  and  define 

1  if  ^e/li 
**■  ~  [0  if  YiZAz 

Then,  since  for  the  simulation  study  the  priors  and  the  sample  sizes  rij  and  n2 
are  equal,  we  have  that 

nz^r^]‘7M^T=T{y)"h{y)-  <41I) 

The  three  models  that  we  will  consider  are  the  following: 

L- LINEAR:  The  densities  f  x  and  /2  in  (4.1.1)  are  assumed  to  be  bivariate 

Normal  with  possibly  different  means  and  the  same  variance 
covariance  matrix  2.  The  means  are  estimated  by  the  sample 
means  and  E  is  estimated  by  the  pooled  sample  variance  covari¬ 
ance  matrix.  This  gives  rise  to  linear  discriminant  analysis  which 
is  frequently  applied.  See  for  example  BMDP  ( 1975). 
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Q- QUADRATIC:  The  densities  f  j  and  /  2  in.  (4.1.1)  sure  assumed  to  be  bivariate 
Normals  with  possibly  different  mean  vectors  and  variance 
covariance  matrices  Ei  and  Eg-  The  means  are  estimated  by  the 
sample  means  and  Ej  and  Eg  are  estimated  by  the  sample  vari¬ 
ance  covariance  matrices. 

S-  SPLINE:  The  posterior  probability  p  (y )  in  (4.1.1)  is  estimated  directly  by 

minimizing 

a^2(p) 

ni«l 

subject  to 

Osjstej)  s;  1.  j  =  1 . fc, 

where  J2(p)  is  given  by  (2.2.8)  with  7n=2  and  the  smoothing 
parameter  A  is  estimated  by  generalized  cross-validation  for  con¬ 
strained  problems  its  described  in  section  2.3. 

The  spline  was  computed  as  follows:  First  the  unconstrained  problem  was 
solved  and  an  estimate  was  obtained  using  Wendelberger's  (1981)  algo¬ 
rithm.  A  regular  grid  of  15x15  points  was  constructed  in  the  range  of  the  data 
and  the  unconstrained  spline  was  evaluated  at  these  225  points.  If  the  con¬ 
straints  were  violated  at  some  of  these  225  points,  then  the  constrained  problem 

was  solved  using  A  from  the  unconstrained  problem  as  initial  value  for  A  for  the 

constrained  problem.  The  set  of  points  Sj . s*  at  which  the  constraints 

were  enforced  consisted  of  the  subset  of  of  the  225  points  at  which  either 
fn\ >0.9  or  f%x<Q.l.  In  all  our  simulations  we  restricted  k  to  be  less  than  100. 

It  would  have  been  desirable  to  compare  the  spline  model  with  the  Kernel 
model  of  Habbema,  Hermans  and  Van  den  Broek  (1974),  unfortunately  we  could 
not  get  their  software. 
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In  each  simulation,  training  data  were  generated  to  estimate  the  posterior 
probability  (4.1.1)  according  to  each  of  the  three  models.  We  fixed  the  sample 
size  at  n  1  =  n2  -  70.  Also  100  additional  data  points  were  generated  from  each 
population  to  serve  as  test  data. 

The  IMSL  (1982)  routines  GGNSM  and  GGUBS  were  used  to  simulate  the  data 
from  four  different  types  of  distributions.  Using  the  notation  N2(/j.i.fiz,crn,(7zi) 
for  the  uncorrelated  bivariate  normal,  the  four  types  of  distributions  are  given 
in  Table  4.1. 

Density  contours  for  each  type  of  distribution  are  given  in  Figure  4.1.1 

For  each  type  of  distribution  four  simulations  were  done.  In  each  simula¬ 
tion.  140  training  and  200  test  observations  were  generated  and  the  values  of  six 
measures  of  performance  were  calculated  for  each  model.  These  values  were 
averaged  over  the  four  simulations  and  their  standard  deviations  were  com¬ 
puted.  For  distribution  type  three  six  more  simulations  were  done  with  training 
samples  of  size  50  for  each  group  and  three  simulations  with  training  samples  of 
size  90. 


TABLE  4.1.1 

i _ i 

Sample 

/ 1 

/  2 

1 

JV2(0,0,l.n 

iV2(2, 0,1,1) 

I  2 

Nglo.0. 1.1) 

A'2(5.0,16,16)  | 

3 

Nz(0.0, 1.1) 

l^2(1.5,-2.5,l.l)  +  ^V2(l.5,2.5,l.l)  j 

4 

^V2(0,5.l.l) 

tf2(0,0,l6,l6)  : 

i _ 

+  T^z(0.-5,1,1) 

- 1 

Density  Contours 


Measures  of  Performance 

We  use  misclassification  rate  as  a  criteria  for  measuring  how  well  each  of 
the  three  models  performs.  The  simplest  way  of  estimating  misclassification 
rate  is  by  the  proportion  of  training  samples  that  are  misclassified  However, 
this  leads  to  an  optimistic  result,  and  unless  the  training  sample  is  perfectly 
representative  of  the  population,  the  classifier  will  reflect  peculiarities  of  the 
sample  at  hand  that  do  not  exist  in  the  population.  The  error  rate  calculated  by 
reclassifying  the  design  set  is  known  as  "apparent  error  rate".  The  "true  error 
rate"  of  a  classifier  is  the  expected  error  rate  of  this  classifier  on  future  samples 
from  the  same  population. 

One  of  the  advantages  of  a  simulation  study  is  that  we  can  generate  a  test 
sample  from  the  same  distribution  of  the  training  sample  and  get  a  better  esti¬ 
mate  of  the  true  misclassification  rate  from  the  proportion  of  test  elements  that 
were  misclassified. 

Several  authors  like  Gilbert  (i960),  Lachenbrook  and  Mickey  (1960),  Marks 
and  Dunn  (1974),  Goldstam  (1975),  Van  Ness  and  Simpson  (1976),  Aitchison, 
Habbema  and  Kay  (1977)  and  Remme,  Habbema  and  Hermans  (1900)  have 
evaluated  various  discriminant  analysis  models. 

We  choose  to  use  the  same  measures  of  performance  that  Remme, 
Habbema  and  Hermans  (1900)  used  to  compare  their  Kernel  model  with  the 
linear  and  quadratic  models.  All  this  performance  measures  are  computed  on 
the  test  sample 

Two  kinds  of  allocation  rules  are  considered.  One  is  the  "forced  allocation 
rule"  (FAR)  which  consists  of  allocating  the  test  element  to  the  population  with 
higher  posterior  probability,  that  is,  the  test  element  t  is  allocated  to  /Ij  if 

p(£)>0.5  and  to  <42  otherwise  This  kind  of  rule  does  not  take  into  account 
differences  in  the  posterior  probabilities  between  for  example  0.55  and  0  95 


This  lead  us  to  the  consideration  of  a  "doubt  allocation  rule"  (DAR).  This  alloca¬ 
tion  rule  is  as  follows:  allocate  test  element  t  to  population  <4i  if  p(t )  is  greater 
than  some  prespecifled  threshold  value  <5>0.5;  allocate  t  to  j42  p(0<l— 6  and 

allocate  f  to  a  "doubt  category”  if  1  — <5  ^  p(t)  ^  6.  The  threshold  value  used  in 
our  simulations  is  <5=0.9. 

We  have  two  groups  of  performance  measures.  The  first  group  is  used  to 
compare  the  models  L,  Q  and  S  among  themselves,  that  is,  without  using  the 
knowledge  of  the  true  posterior  probabilities.  These  first  group  of  measures  are: 

PI:  Percentage  of  test  elements  allocated  to  population  of  origin  using  FAR. 

P2:  Percentage  of  test  elements  allocated  to  population  of  origin  using  DAR. 

P3:  Percentage  of  test  elements  allocated  to  population  from  which  they  did 
not  originate  using  DAR. 

The  following  three  measures  compare  each  estimate  of  the  posterior  pro¬ 
bability  under  the  three  models  with  the  true  posterior  probability.  These 
measures  are  more  adequate  for  evaluating  the  estimation  of  the  posterior  pro¬ 
bability.  These  measures  are: 

P4:  Percentage  of  test  elements  allocated  to  the  same  population  with  both  the 
true  and  the  estimated  posterior  probability  using  (FAR). 

P5:  Percentage  of  test  elements  for  which  there  is  "strong  agreement"  between 
true  and  estimated  posterior  probability  using  DAR  (see  bellow). 

P6:  Percentage  of  test  elements  for  which  there  is  "strong  disagreement" 
between  the  true  and  estimated  posterior  probabilities  using  DAR  (see  bel¬ 
low). 

Measures  P5  and  P6  are  computed  as  follows.  For  each  test  element  p{t ) 
and  p(£)  were  computed  and  the  test  element  was  allocated  to  one  of  the  9 
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categories  in  table  4.1.2  according  to  the  values  of  p(t)  andp(£)  Then,  P5  ls 
the  computed  as  the  percentage  of  test  elements  that  fall  in  classes  c  u,  c3 z  and 
c33-  P8  is  the  percentage  of  test  elements  that  fall  in  classes  c31  and  c  13. 


TABLE  4.1.2 

P 

ro.o.o.n 

(0.1. 0.9) 

ro.9.1.01 

[0.0,0. 1] 

cn 

c12 

c  13 

P~ 

(0.1, 0.9) 

C21 

c22 

c23 

[0.9, 1.0] 

C31 

c32 

c33 

Results 

For  each  of  the  four  types  of  distributions  the  average  values  of  Pi,  P2.  P3. 
P4,  P5  and  P6  over  the  four  simulations  are  given  in  tables  4.1.3  through  4.1.6. 
The  column  labeled  U.S.  in  these  tables  corresponds  to  the  unconstrained  spline 

This  information  is  summarized  in  figures  4.1.2  through  4.8.7 


TABLE  4.1.3 


Distribution  Type  1 


Perf. 

L 

Stan. 

Dev. 

0 

Stan. 

Dev. 

S 

Stan. 

Dev,  j  U.S. 

PI 

82.50 

2.58 

81.38 

2.95 

81.25 

3.40  81.50 

P2 

65.13 

1.80 

65.13 

1.38 

62.25 

3.01  61.00 

P3 

4.13 

1.25 

4.38 

1.30 

3.50 

2.04  2.75 

P4 

97  75 

1.19 

96.88 

0.75 

95.75 

2.78  ;  96.25 

P5 

90.63 

2.63 

88.50 

5.74 

88.75 

7  79  1  92.25 

-  P6 _ 

o.oo 

0.00 

Q..P.Q 

0.00 

0.00 

Stan.  ! 
Dev.  I 
2.55  | 
2.74  | 
1.19 
2.22  | 
3.38  : 
0.00 
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TABLE  4.1.4 
Distribution  Type  2 


PI 

86.00 

P2 

43.63 

P3 

0.75 

P4 

88.63 

P5 

41.63 

_ ES _ 

1.08  I  90.13 
79.38 
3.63 


1.44  97.25 

10.36  95.88 


1.03 

92.38 

9.60 

68.25 

0.75 

0.50 

1.08 

10.90 

95.50 

75.38 

0-91 
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TABLE  4.1.5 
Distribution  Type  3 


Pi 

P2 

P3 

P4 

P5 

69.25 

38.38 

0.38 

71.83 

41.83 

2.40 

2.46 

0.48 

2.86 

3.57 

84.50 

52.38 
0.50 

90.88 

70.38 

_ ES _ 

HkSI 

Stan. 


1*5 


53.25 

26.13 

0.00 

51.50 

35.25 


Mvl 


TABLE  4.1.6 
Distribution  Type  4 


88.13 

60.25 

0.50 

91.38 

55.63 


[•M'l 


[IM4 


92. 13 

80.63 

2.13 
94.38 

73.25 


[•JP-id 


I  Stan. 


[•MU 


93.63 

81.25 
1.00 

95.63 

77.25 


It  is  clear  from  this  figures  that  the  linear  model  performs  well  only  with 
samples  from  distribution  type  1,  that  is,  when  both  samples  come  from  bivari¬ 
ate  normals  with  the  same  variance  covariance  matrix.  Even  in  this  case  the 
performance  of  the  quadratic  or  the  spline  models  is  almost  as  good  as  the  per¬ 
formance  of  the  linear  model.  When  the  samples  are  generated  from  two  bivari¬ 
ate  normals  with  different  variance  covariance  matrices,  the  quadratic  model 


Distribution  Type 

Figure  4.1.2*  Percentage  of  tost  alsnant*  classified 
correctly  using  Forced  Allocation  Rule. 

S—  Const.  Spline.  0-  -  Quadratic.  L.  .  .  Linear. 


Distribution  Type 


Figure  4.1.3.  Percentage  of  test  elements  classified 


corrsctly  using  Doubt  Allocation  Rule. 


S -  Const.  Splins.  Q-  -  Quadratic.  L. . .  Linear. 


Distribution  Typo 

Figure  4. 1.4.  Per cant ago  of  test  elements  classified 
lnearrsetly  using  Doubt  Allocation  Rule. 

S - Const.  Spline.  0-  -  Quadratic,  L. .  .  Linear. 


Distribution  Type 

Figure  4. 1.5.  Percentage  of  test  elements  classified  to 
same  category  with  true  and  estimated  P. P.  <F.  A.  R.  )  . 
S—  Const.  Spline.  Q-  -  Quadratic.  L.  .  .  Linear. 


Distribution  Typo 

Figure  4.1.6.  Percentage  of  toot  a lament*  classified  to 
sama  category  with  true  and  estimated  P. P.  <□. A. R. ) . 

S - Const.  Spline.  0-  -  Quadratic.  L.  .  .  Linear. 


Distribution  Type 

Figure  4.1.7.  Percentage  of  strong  disagreement  between 
true  and  estimated  Posterior  Probability  (0. A. R. ) 
S—  Const.  Spline,  Q-  -  Quadratic.  L.  .  .  Linear. 
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performed  better  than,  the  spline  as  expected,  however  the  spline  was  very 
closed  to  the  quadratic.  In  samples  from  distributions  type  3  and  4  the  spline 
was  consistently  better  than  the  quadratic.  The  difference  between  the  spline 
and  the  quadratic  was  larger  when  the  doubt  allocation  rule  was  used. 

The  spline  did  particularly  well  in  measures  P4,  P5  and  P6  which  give  a 
better  idea  of  how  well  the  estimate  approximates  the  true  function. 

From  table  4.1.3  through  4.1.6  we  can  also  see  that  in  terms  of  classification 
there  is  not  too  much  difference  between  the  unconstrained  and  the  constrained 
spline,  being  the  performance  of  the  constrained  spline  slightly  better  in  most 
cases. 

In  the  case  of  distribution  type  3,  six  more  samples  were  generated  to  see 
the  effect  of  the  sample  si2e  on  the  spline  estimate.  The  first  three  simulations 
were  done  with  n1  =  n2  =  25  and  the  last  three  with  n  j  =  n,2  =  45.  The  results 
of  these  simulations  together  with  the  results  for  nl  =  n2  =  70  are  given  in 
tables  4.1.7  and  4.1.8  and  summarized  in  figures  4.1.8  through  4.1.13.  In  all  the 
measures  the  spline  performed  better  than  the  quadratic  and  linear  models  for 
the  three  different  sample  sizes. 

In  figures  4.1.14  to  4.1.33  we  present,  for  one  of  the  simulations  for  each 
type  of  distribution,  plots  of  the  approximate  generalized  cross-validation  func- 
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Distribution  Type  3,  Sample  Size 
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TABLE  4.1.B 
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Distribution  Type  3,  Sample  Size  :  90 


HZ2HI 

i. 

Stan. 

Dev. 
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pi 
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0.50 

P2 
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P3 
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0.29 
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0.29 
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0.29 

0.83 

0.29 

P4 
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P5 

45.50 
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tion.  contour  plot  of  spline  and  quadratic  estimates  and  the  true  posterior  pro¬ 
bability  and  surface  plots  for  the  true  posterior  probability  and  its  spline  esti- 
I  mate.  In  the  case  of  distribution  types  3  and  4  we  also  present  contour  plots  of 

the  quadratic  estimate  together  with  the  true  function. 


In  the  plots  of  the  generalized  cross-validation  function  we  use  two  different 
symbols:  and  "o”  to  indicate  when  the  set  of  active  constraints  changes.  If 

for  two  consecutive  values  of  A  the  symbol  changes,  this  indicates  that  the  set  of 
active  constraints  is  different  for  those  two  values  of  X. 

In  the  contour  plots  we  use  a  solid  line  for  the  contours  of  the  constrained 
spline,  dashed  line  ( — )  for  the  contours  of  the  true  posterior  probability  and  a 
dash-dot  line  (-.-)  for  the  contours  of  the  quadratic  estimate.  The  data  is 
presented  in  the  same  plot.  A  dark  triangle  in  the  contour  plot  indicates  the 
viewpoint  for  the  surface  plot. 


Distribution  Typa  3 


- a 


Samp Is  Siza 

Figure  4. 1. 9t  Psrcsntags  af  tsst  elements  clossif iad 
corraetly  using  Forcad  Allocation  Rula. 

S -  Const.  Splina.  Q-  -  Quadratic.  L.  .  .  Llnaar. 

Distribution  Typa  3 


Sompla  Siza 

Figura  4. 1.9.  Parcantoga  of  tast  alamants  classified 
corraetly  using  Doubt  Allocation  Rule. 

S -  Const.  Splina.  Q-  -  Quadratic.  L. . .  Linear. 


0 i str i but 1 on  Type  3 


Samp la  Size 

Flgur*  4.  1.  1.0.  Parcantaga  of  tait  alamants  classified 
incorractly  using  Doubt.  Allocation  Rula. 

S - Const.  Splina.  0-  -  Quadratic.  L.  .  .  L inoar. 

Distribution  Typa  3 


Sample  Size 

Figure  4. 1. 11.  Parcantaga  of  test  alamants  classified  to 
same  category  with  true  and  estimated  P. P.  (F.  A.  R.  >  . 

S — -  Const.  Spline,  Q-  -  Quadratic.  L. . .  Linear. 


Distribution  Typa  3 


Samp la  Siza 

Figure  4. 1. 12.  Rareantaga  of  taat  elements  classified  to 
•ana  cotagory  with  trua  and  aatimatad  P. P.  CD.  A.  R.  )  . 
S—  Conat.  Spllna.  0-  -  Quadratic.  L.  .  .  Linear. 

Distribution  Typa  3 


Sample  Size 

Figure  4. 1. 13.  Percentage  of  strong  disagreement  between 
true  and  estimated  Posterior  Probability  CD. A. R. ) 

S - Const.  Spline.  Q-  -  Quadratic.  L.  .  .  Linear. 


log (lambda) 

Figure  4.1.22:  Approximate  GCVC,  distribution 

type  3 . 


Figure  4.1.23:  Constrained  spline  ( - ),  true 

P.  ?.(-  -)  and  data  (+:  sample  1,  o:  sample  2) 


Figure  4.1.27:  Constrained  spline  , 
distribution  type  3. 
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4.2  Other  simulation  results 

In  this  section  we  present  two  examples  that  were  generated  from  univari¬ 
ate  distributions  and  we  give  some  computation  times  for  the  simulations  of  sec¬ 
tion  4. 1 . 

In  our  first  univariate  example,  sample  1  was  generated  from  a  Normal  dis¬ 
tribution  with  mean  zero  and  variance  one.  and  sample  2  from  a  Normal  distri¬ 
bution  with  mean  1.5  and  variance  1.  In  figures  4.2.1  and  4.2.2  we  present  the 
unconstrained  and  constrained  spline  estimates,  together  with  the  quadratic 
estimate,  true  posterior  probability  and  data. 

In  the  second  univariate  example,  sample  1  was  generated  from 
0.3JV(0,0.25)+0.7iV(0.16),  and  sample  2  from  a  IV(0,0.25).  The  corresponding 
plots  are  given  in  figures  4.2.3  and  4.2.4. 

In  figures  4.2.2,  4.2.3,  4.2.5,  and  4.2.6  it  is  clear  that  tor  classification  pur¬ 
poses  there  is  basically  no  difference  between  the  unconstrained  and  the  con¬ 
strained  splines,  but  certainly,  we  would  not  want  to  show  a  client  a  plot  of  an 
estimated  posterior  probability  that  can  take  values  greater  than  one  or  smaller 
than  zero. 

The  time  of  computation  of  the  spline  depends  on  the  sample  size 
n=n1+7i2,  the  number  of  constraints  fc,  the  number  of  values  of  X  to  be  con¬ 
sidered  for  evaluation  of  the  generalized  cross-validation  function  and  the 
number  of  times  that  the  active  set  of  constraints  changes  from  one  value  of  X 
to  the  next. 

In  table  4.2.1  we  present  the  average  C  PU  time  used  by  the  routine 
DSCOMP  to  compute  the  spline  in  the  simulations  of  section  4.1.  In  table  4.2.1 
the  column  labeled  "nc"  contains  the  average  number  of  constraints  enforced  in 
the  simulations,  the  column  labeled  ”%Time  In  G.E.P."  contains  the  percentage 


Figure  4.2.4:  Constrained  spline  (-  - 

quadratic  ( —  — ) ,  true  ( - )  and  data 

n. *n«*45 . 
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of  time  spent  in  the  generalized  eigenvalue  problem  and  the  column  labeled 
"%Time  in  Q.P."  contains  the  percentage  of  time  spent  in  the  solution  of  the  qua¬ 
dratic  programming  problem.  , 


TABLE  4.2.1 

Dist. 

wm aw 

Type 

■n 

ReVIli 

mm 

140 

96 

252.03 

43.5 

40.2 

140 

94 

230.45 

43.1 

46.0 

KB 

50 

78 

54.15 

21.6 

62.9 

mm 

90 

65 

63.33 

32.3 

56.7 

mm 

140 

67 

184.24 

54.1 

34.9 

;1B 

-L4Q  ... 

_25_ 

_2S2JJ _ 

47.5 

_ £LZ _ 

From  this  table  we  can  see  that  on  the  average  47%  of  the  computation  time 
is  spent  on  the  solution  to  the  generalized  eigenvalue  problem  and  40%  in  the 
solution  of  the  quadratic  programming  problem  except  in  the  small  sample 
cases  (n=50,  90).  In  the  n=50  case  62.9%  of  the  CPU  time  was  spent  in  the 
solution  of  the  quadratic  programming  problem  while  only  21.6%  of  the  time  was 
spent  in  the  generalized  eigenvalue  problem.  In  the  n=90  case.  56.7%  of  the 
time  was  spent  in  the  quadratic  programming  problem  and  32.3%  in  the  general¬ 
ized  eigenvalue  problem.  The  reason  for  this  is  that  the  quadratic  programming 
routine  works  better  when  the  number  of  constraints  in  the  active  set  is  small 
compared  with  the  number  of  observations.  The  algorithm  to  solve  the  general¬ 
ized  eigenvalue  problem  is  an  iV3  algorithm,  where  n—M^N^n+k—M,  there¬ 
fore  it  would  be  very  expensive  to  use  the  routine  DSCOMP  to  compute  the  con¬ 
strained  spline  for  very  large  n.  In  our  simulation  study  even  with  n  =50  we  got 
pretty  good  results  and  in  distributions  type  3  and  4  the  spline  was  consistently 
better  than  the  quadratic  model  for  n  =  50,  90  and  140,  however  we  do  not 
recommend  its  use  for  very  small  n.  In  chapter  6  we  will  discuss  a  possible 
alternative  for  large  sample  sizes. 


Here  we  apply  our  method  of  estimating  the  posterior  probabilities  to 
some  results  of  a  psychological  test  of  a  group  of  25  normal  persons  and  25 
psychotics.  We  obtained  this  set  of  data  from  Smith  (1947).  The  psychotics  will 
be  population  Aj  and  the  normals  Az.  The  two  covariates  are  an  unweighted 
total  score  or  "size”  obtained  by  Penrose's  method  (Penrose  1945)  and  a 
weighted  score  related  to  shape. 

In  figure  4.3.2  we  present  the  data  and  the  contour  levels  for  the  spline 
estimate  (solid  lines)  of  the  probability  that  a  given  subject  belongs  to  the 
psychotic  group  given  its  particular  measurements  of  size  and  shape  From  this 
figure  it  appears  that  the  spline  estimate  gives  a  more  accurate  representation 
of  the  data  than  the  quadratic  estimate. 

In  figure  4.3.3  we  present  a  view  of  the  surface  of  the  spline  estimate 
from  the  point  indicated  by  a  dark  triangle  in  the  contour  plot  4.3.2.  The 
approximate  generalized  cross-validation  function  is  given  in  figure  4.3. 1. 
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and  similarly: 


P(2i-0  I  J(=Vi)  = 

So  that  if  Zj,  .  .  .  ,  zn  are  the  observed  values  of  Zj, 
tional  likelihood  for  r  is  given  by 


(5.1.4) 


Zn.  the  condi- 


and  hence,  the  log-likelihood  is  given  by 


(5.1.5) 


logL(r)  =  £  zt logr (yt ) -log [l+r^)] 

i-l 


(5.1.8) 


A  maximum  likelihood  approach  to  estimate  r  would  be  to  maximize  (5.1.6). 
The  fact  that  L{r )  is  undefined  for  negative  values  of  r(t/i),  while,  for  any  i  such 
that  zi  is  zero,  the  likelihood  increases  as  r  (yt)  tends  to  zero  can  lead  to  com¬ 
putational  difficulties  if  the  estimation  of  r  is  considered  directly.  To  avoid  this. 


consider  the  estimation  of  the  logarithm  of  r  and  let 

9  =  log  r. 

then  the  conditional  likelihood  L*  of  g  becomes 


(5.1.7) 


R  exp[y(Vi)] 

i'(s)  =  n  i+«p(»(»,)] 


and  the  log-likelihood  is 


logX#(£f)  =  2  ^(yJ-logJl+eaptyCy^] 

If  we  wanted  to  maximize  (5.1.8),  we  could  take 


(5.1.8) 


00  if  Zi  =  1 
9(yi)  ~  —CO  if  zi  =  O' 

To  avoid  this  undesirable  solution  we  should  use  the  underlying  assumption  that 
g  is,  in  some  sense,  nc  too  rough  Therefore  we  should  penalize  the  likelihood 
according  to  the  roughness  of  g . 


This  device  of  penalizing  for  roughness  has  been  used  in  different  contexts 
by  Reinsch  (1967),  Good  and  Gaskins  (1971)  and  other  subsequent  authors. 

We  will  assume  that  the  function  g  is  in  the  reproducing  kernel  Hilbert 
space  H(m,d)  defined  in  section  2.2.  The  "penalized  likelihood"  estimate  of  g  is 
the  function  that  minimizes 

A(s)  =  -logZ,*(g)+A/m(g),  (5.1.9) 

where  Jm  is  given  by  (2.2.8).  In  the  particular  case  where  m=  2  and  d=l  ,  I\(g) 

becomes 

E  [log(l+ea7)[^(yt)])-*iff(y<)]  + 

which  is  the  expression  that  Silverman  (1978)  minimizes  to  estimate  g . 

5.2  Existence  and  uniqueness  of  an  estimator 

In  this  section  we  discuss  the  existence  and  uniqueness  of  the  mimmizer  of 
I\(g)  in  and  we  characterize  it  as  a  Laplacian  Smoothing  Spline.  The 

optimization  theoretic  results  needed  here  are  given  in  appendix  Al. 

As  in  chapter  2.  let  Ho(m,d)  be  the  space  of  polynomials  of  degree  less 
than  m  defined  on  R*.  and  let  H^m.d)  be  the  orthogonal  complement  cf 
//0(m  ,d)  in  Hi(m  ,d). 

First  we  establish  the  uniqueness  of  a  minimizer  of  I\(g)  in  the  following 

5.2. 1  Lemma 

The  minimizer  ox  in  H(m  ,d )  of  Ix(g),  if  it  exists  is  unique. 


Let  f  €.H(m,d)  and  define 


then,  by  proposition  A1.4,  ire  have  that  the  2  th  Gateaux  variation  of  I\(g)  in  the 
direction  (J . / )  is  given  by 

i^Hg)U . /)*  ~<(OI«-o- 

dxJ 

Note  that  Jm(g)  -  \\Pg \\f/  where  P  is  a  projection  operator  into  the  space 
H \(m  ,d).  then,  we  compute  the  first  and  second  Gateaux  variations  as  follows. 

^J<(0  =  ^  E  *i[0(Vi)+V(%)l  -  logU+expIXyt )+£/(!/*)!) 

+  ^\<g+tf  ,g+tf> 

5 

_  A  _  f  .  x  /(tt)«p[g(lfc)+*/(«i)] 

1 +exp[g(yi^tf(yi)] 

+2 \<f  ,g>  +  2Af  </  ,/  > 


And  differentiating  with  respect  to  t  again  we  obtain: 


d.2  t/ix  _  *  /(Vi)exp[g(Vi )+«/(%)]  .  ^  ^ 

s^c ' '  k  ittvUwwm  zx  1 J 


Finally,  evaluating  at  t=0  we  get: 


>OV/E  0. 


Then,  by  proposition  A1.5  I\  is  strictly  convex,  and  hence,  if  it  has  a  minimum,  it 
is  unique. 


We  now  establish  sufficient  and  necessary  conditions  for  the  existence  of  a 


minimizer  of  I\  in  the  space  Ho(m  ,d ). 

5.2.2  Lemma 

Let  Hq(tti  ,d )  be  the  space  of  polynomials  of  degree  less  than  m  defined  in 
If*.  The  minimizer  of  I\  exists  and  is  unique  if  and  only  if  ther "  is  no  level  curve 
of  an  element  of  //o(m  »d)  that  completely  separates  the  samples. 


Proof 


Let  g eHo(m,d).  then,  since  Jm(g)  =  0, 


I\(g)  =  S  logd+eapteCy^j-ZiS  (t/J 


Now.  there  exists  a  level  curve  of  a  polynomial  that  completely  separates 
the  samples  if  and  only  if  there  exists  g  *e//0(m  ,d )  such  that 


g'iyi)  = 


>0  if  Vi  eA  j 
<0  if  yi  e-d2- 


(5.2.1) 


We  first  show  the  if  part.  Suppose  that  there  is  no  ye/f0(m,ti)  satisfying  (5.2.1), 
then  for  all  g  €//0(m  .d)  there  exists  at  least  one  yj,  j  =  1 . n  such  that 


ff(yj)  « 


<0  ifty-edi 
>0  ify;€42' 


Now, 


h(g)=  t  py(i+e*p[y  (%)]-**£  (y») 


<-I 


zslogil+explgiyi^-Zjgiyj)). 

So,  if  yy€i4,.  g(y j-)<0  =>  /*(y)-»«  as  |ly||tfo-*»  and  if  x/,e42. 
g(yj)> 0  =>  Iiig)-*00  as  llylly,-*®  Therefore,  by  proposition  Al. 7  there  exists 

fli€//o(m,d)  that  minimizes  I\(g)  in  Ho(m,d)  and  by  Lemma  5.2.1  such  g  is 


We  now  show  the  only  if  part:  assume  that  there  is  a  unique  pei/0(m,d)  that 
minimizes  /x  and  suppose  that  there  exists  g  *  that  completely  separates  the 

samples,  that  is,  g *  satisfies  (5.2.1).  Since  g  minimizes  /x.  g  maximizes 
exp  [-/x(flr)] 

=>exp[-/x(p)]  *  exp[~I\(g)]  V  ge.H0(m,d). 


0_r_,  -  n  exPt^(i/i)]  rr  exp [-9(yi)] 

x  ~  ill  l+exp[^(T/i)]  JJo  l+exp[-y(yi)] 

and  looking  at  each  term  in  the  above  expression  we  see  that 
exp[r]/(l+ezp  [r])  goes  from  0  to  ^-whenr  goes  from  — to  0  and  it  goes  to 

1  whenr  goes  from  0  to  »  and  necessarily  exp  [— /x(y )]  <  1.  Now,  for  arbitrary 
t»0  consider 

_,r- T  -  rr  a3Pl*9'(yi)]  rr  exp[-i?<7*(yQ] 

x  *,«!  l+ezp^y'fo*)]  %=o 

Since  g  *  satisfies  (5.2.1)  both  products  tend  to  1  as  so  that  for  iJ 

sufficiently  large  we  have 

*)]  >  *®P  [-/*(£)]. 

which  contradicts  the  assumption  that  g  is  the  unique  minimizer  of  I\{g) 


In  Theorems  5.2.1  and  5.2.2  below,  we  establish  conditions  for  existence  and 
uniqueness  of  the  minimizer  of  /x(p)  in  H{m,d)  and  characterize  the  solution 
as  a  Laplacian  smoothing  spline. 


5.2.1  Theorem 


The  minimizer  ffx  of  /x(ff)  in  H(m,d)  exists  and  is  unique  provided  that 
there  is  no  level  curve  of  a  polynomial  of  degree  less  than  m  that  completely 
separates  the  samples. 


Proof 


Uniqueness  follows  from  lemma  5.2.1.  By  lemma  5.2.2  we  only  need  to  show 
that  if  the  minimizer  of  I\(g)  exists  in  Ho(m,d )  then  the  minimizer  exists  in 
H{m,d). 

Any  function  ff€//(m,d)  can  be  written  as  g  =  ffo  +  yi  where 
ff0e//0(m,d)  and  g^H^m.d).  Now  write  /x  as 


I\  (ff)  =  Slog' 


i+Mp[so(yi)+si(y»)] 


-  E  *i[yo(yi)+yi(yi)]  +  a|Is,||2 


i«l 

^  Ally, II2. 

so  that  if  1 1<;  i||-*oo  /x(ff)-*oo  and  by  proposition  A1.7  this  implies  that  /x  has 
a  minimum  so  it  will  suffice  to  show  that  /x  attains  a  minimum  in 


H’  =  H0(m,d)9Hl(m,d), 

where 

//i (m,d)  =  :  HffjlliA’i  for  some  Kx  >  0  . 

Let  ff  €//  *,  1  or  all  y  eR*  we  have 

ly,(y)l  *  Kz\\gi\\*Kz.  (5.2.2) 

Again  write  ff=yo+yi.  where  now  ffj  €  //*(m,d),  then,  adding  and  sub- 
n 

tracting  £  toff  (l  +  exp[ff  0(yi)])  from  /x  and  using  (5.2.2)  we  get: 


h(g)  *  £iog 


i+ggp[g0(ih)+gifa)] 


+  I\(go)  -nK^+XK^.  (5.2.3) 


1+exp  [g0(Vi)] 

Looking  at  an  individual  term  in  the  summation  in  (5.2.3)  note  that 

l+exp[go(yi)+gi(3/i)] 


9 i(Vi)  >  0  =*  log 
and  it  9 i(yi)  <  0.  then 


l+exp[y0(l/t)] 


2i  0 


l^i(3/i)l  =  -0i(yi)  <  Kz 

=>0i(2/i)  >  -*3 


so  that 


log 


H-eay[y0(yi)+y1(yi)]  [  H-eap  [yo(yi)-/r3] 

l+eap[y0(yi)]  [  l+exp[y0(yt)] 


=  iogj- 


exp[-A’3](exp[^]+exp[y  (,(% )]) 


l+exp[y0(l/i)] 

>  -Kz 

then  (5.2.4)  and  (5.2.5)  imply  that 

l+exp[y0(yi)+y1(yi)]f 


Slog 

i-1 


1+exp  [y0(yt)] 
Hence,  by  (5.2.3)  and  (5.2.6)  we  have  that 


a  — 


/x(ff )  *  hia  o)  +  ^ 


lip  I\(9)  >  ..  ’ip  I\(9o)+K* 

oil-*—  too*i?1|-“ 


to  II- 


=  lip  Ix(9o)+K*  =  «■ 


(5.2.4) 


(5.2.5) 


(5.2.6) 


since  I\(go)  attains  a  unique  minimum  in  //0(m ,d).  Therefore,  by  proposition 
A1.7  lx  attains  a  minimum  in 


Silverman  (personal  communication)  has  previously  conjectured  theorem 
5.2.1  and  has  also  noted  a  rather  elegant  property  of  the  minimizer  of  /*(y):  As 

A-*80.  9x  tends  to  an  element  of  the  null  space  of  Jm,  so  that  for  m=2,  the 


estimated  log-likelihood  ratio  will  be  linear  and  for  m  =3  it  will  be  quadratic. 
Thus,  the  parametric  estimate  for  multivariate  normals  is  included  as  a  limiting 
case.  (Compare  Wahba  1978  and  also  Silverman.  1982). 

The  following  theorem  characterizes  the  minimizer  of  I\(g)  as  a  Laplacian 
smoothing  spline.  The  proof  of  this  result  is  straightforward  and  follows  Wahba 
and  Wendelberger  (1982)  and  section  2.2  of  this  thesis,  and  therefore  is  not  given 
here. 

Let  Em.  <pl . (fiy  be  as  in  section  2.2. 

5.2.2  Theorem 

The  minimizer  g\  of  I\(g )  in  H(m,d)  if  it  exists  is  of  the  form 

ffx(y)  =  +.f^j(y). 

*■1  /»1 

where  the  vectors  c  =  (cAj,  ....  cn)*  and  d  =  (5j,  ....  du)1  are  the  solution 
to  the  following  non-linear  optimization 

Problem 

Minimize 

E  log  i+exp  Ecj  ^m(yi.yj)  +  £ 

i«l(  P»1 

-  E  *»  £  ^m(yi.yj)  +  E  djVjiVi 

tf-1  jm  1 

subject  to  T*  c  =0,  where  now, 

E  ■-  i  =  l . n-  j  =  1 . n 


and  T  fsP,- (y» )]  i  =  l,  .  .  .  ,n;  j  =  l . M. 
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Note  that  the  matrix  E  above  corresponds  to  the  matrix  Zjj  of  section  2.2  and 
the  matrix  T  corresponds  to  the  matrix  T j  of  the  same  section. 

To  find  the  estimate  of  g  for  a  given  value  of  X  we  must  solve  a  nonlinear 
optimization  problem  in  n  —M  variables  but  there  is  still  the  problem  of  choos¬ 
ing  the  value  of  the  smoothing  parameter. 

Since  the  conditional  distribution  of  Z*  given  Yi  =y*  is  trinomial  ( 1  ,p ) 
where  p  =  P(Zi  =  l  1 =l/i ).  then 


£[Zi\Yi=yi]  = 


exp  [*(%)] 

i+exptMyi)]  ’ 


so  that  an  ordinary  cross-validation  estimate  of  X  would  be  the  value  of  X  that 


minimizes 


,  »  .If]  J9l 

expOx  (Vg)]/{l+exp[px  (y,)]j“V<  (5-2-8) 

Jil 

where  <7x  is  the  estimate  of  g  obtained  by  minimizing  (5.2.7).  but  leaving  out 
•the  q01  observation.  Obviously  (5.2.8)  would  be  prohibitive  to  compute  since  for 
each  value  of  X  we  must  solve  n  nonlinear  programming  problems  inn—M  vari¬ 
ables. 

4 

Wahba  (personal  communication)  suggested  that  by  minimizing  I\(g )  in  a 
subspace  of  H(m,d)  consisting  of  tensor  product  B-splines  we  might  be  able  to 
get  some  computational  simplifications. 


Recently.  O'Sullivan  (1983)  has  developed  a  numerical  algorithm  together 
with  a  generalized  cross-validation  estimate  of  X  which  is  suitable  for  use  with 
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CHAPTER  8 

CONCLUSION 


6.1  Summary 

We  have  introduced  a  nonparametric  estimate  (or  the  posterior  proba- 
biiities  in  the  classification  problem.  The  smoothing  parameter  of  the  estimate 
is  determined  from  the  data  by  the  method  of  generalized  cross-validation  for 
constrained  problems. 

Our  Monte  Carlo  experiments  attest  to  the  accuracy  of  the  approxi¬ 
mate  generalized  cross-validation  function  to  choose  the  smoothing  parameter. 
Through  this  simulation  experiments  we  compared  the  spline  model  to  estimate 
posterior  probabilities  with  two  parametric  models  which  are  very  commonly 
used:  the  linear  model,  which  is  based  on  the  assumption  of  Normality  with  the 
same  variance-covariance  structure  for  the  populations  involved,  and  the  qua¬ 
dratic  model  which  is  based  on  the  assumption  of  Normality  with  possibly 
different  variance-covariance  structures.  The  linear  model  performed  well  only 
when  the  samples  were  generated  from  Normal  distributions  with  the  same 
variance-covariance  matrix.  The  spline  model  performed  almost  as  well  as  the 
quadratic  model  with  samples  generated  from  Normal  distributions  with  the 
same  or  different  variance-covariance  matrices,  and  its  performance  was  con¬ 
sistently  superior  to  that  of  the  quadratic  model  for  samples  generated  from 
non-Normal  distributions. 

It  can  be  seen  that  this  method  presents  a  nonparametric  alternative 
to  logistic  discrimination  as  well  as  to  survival  curve  estimation.  In  logistic 
regression  h(y)  is  modeled  as  the  logistic  function 


----ft  -*»■  -■ 


s 


■ 


g 


89 


My)  = 


exp 

d 

«o  +£aiy(i) 

1  +  exp 

d 

+  £a*y(i) 

(6.1.1) 


where  y  =(j/(l) . y(i))  and  the  parameters  ctQ . ad  are  estimated,  e.g.  by 

maximum  likelihood.  See  for  example,  Cox  (1966),  Day  and  Kerridge  (1967), 
Anderson  (1972),  and  Anderson  and  Blair  (1962).  Here  the  discriminant  func¬ 
tions  will  be  hyperplanes.  In  survival  curve  estimation  suppose  y  is  a  "dose"  and 
My)  is  the  probability  that  a  subject  survives  given  a  "dose"  y .  One  "observes" 
that  subject  i  has  "dose"  y*.  and  then  one  observes  a  response,  which  is  2*  =  1  if 
the  subject  survives  and  z*=0  if  the  subject  dies.  In  logistic  survival  curve  esti¬ 
mation,  a  function  of  the  form  (6.1.1)  is  fitted  to  the  data  (zi,yi).  however  it  is 
clear  that  the  constrained  spline  estimate  provides  a  nonparametric  alternative. 

The  algorithm  developed  to  estimate  the  posterior  probabilities  can  be 
used  to  solve  the  more  general 


Problem  6.1.1 

Given z*  =  /(y»),  i  =  1,  .  .  .  ,n,  flnd/n*  e  H(jn,d )  to  ninimize 
£  (y*)-*t)2  + 

subject  to 

j  =  l . k. 

The  subroutines  to  solve  problem  8.1.1  are  listed  in  appendix  A2.  They  allow 
different  measurement  error  variances,  any  dimension  between  1  and  6  and  any 
value  of  m  satisfying  2m  —  cf  >  0. 

Finally,  the  problem  of  estimating  the  likelihood  ratio  is  also  considered.  A 


A 


penalized  likelihood  estimator  is  given  and  conditions  for  existence  and  unique¬ 
ness  of  such  an  estimator  are  given  however,  no  data-based  method  to  choose 
the  smoothing  parameter  is  provided. 


6.2  Future  work 


The  results  presented  in  the  previous  chapters  are  new  and  quite  promis¬ 
ing.  There  is  no  doubt  that  there  are  many  interesting  research  problems  in 
this  area.  For  example,  we  believe  that  convergence  rates  may  be  established 
using  the  results  of  Wahba  (1979a)  and  Cox  (1982). 

More  simulation  studies  are  desirable  to  study  the  effect  of  different  sample 
sizes  in  the  spline  estimate. 

We  think  that  the  methods  outlined  in  chapters  2  and  3  can  be  extended  to 
analyze  large  data  sets  following  Bates  and  Wahba  (1983). 

We  plan  to  incorporate  the  handling  of  replicate  observations  in  the  algo¬ 
rithm  as  well  as  the  possibility  of  enforcing  linear  constraints  not  only  on  the 
values  of  the  function  but  also  on  the  values  of  the  derivatives  or  the  function. 
This  would  allow,  for  example,  enforcing  monotonicity  constraints. 
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SOME  MATHEMATICAL  OPTIMIZATION  RESULTS 
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In  this  appendix  we  present  some  of  the  properties  of  Hilbert  spaces  and 
some  results  in  mathematical  optimization  theory  that  are  used  in  this  thesis. 
We  do  not  present  the  proofs  of  these  results  because  they  can  be  found  in 
several  books.  The  results  on  Hilbert  space  can  be  found  in  Akhiezer  and  Glaz- 
man  (1961)  and  Aubin  (1979).  For  results  on  reproducing  kernel  Hilbert  spaces, 
see  Aronszajn  (1950).  The  optimization  theoretic  results  can  be  found  in  books 
such  as  Luenberger  (1969),  Ortega  and  Rheinboldt  (1970),  and  Daniel  (1971). 

A  useful  summary  of  important  results  relevant  to  this  thesis  can  be  found 
in  the  two  appendices  of 'Tapia  and  Thompson  (1976).  They  present  proofs  of 
results  that  cannot  be  easily  found  in  standard  analysis  texts. 

Review  of  Hilbert  space  properties 

Throughout  this  appendix  let  H  denote  a  Hilbert  space  of  functions  that 
map  R*-»R  Let  H*  be  the  dual  of  H,  that  is,  the  vector  space  of  all  bounded 
functionals  from  H  into  R 

A  functional  L:H-*Ris  said  to  be  bounded  if  there  exists  a  constant  C  such 
that  |  Lf  \*C\\f\\tf  for  all  /etf.  A  functional  L:H-*R  is  said  to  be  continuous 

at  J  fZH  if  jfnJc#  and  /w-*/  =*  •  L  is  said  to  be  continuous  in 

S cH  if  it  is  continuous  at  each  point  in  5 . 


Al.l  Proposition 


If  L  is  a  linear  functional  defined  on  H ,  then  the  following  are  equivalent: 

(i)  L  is  bounded 

(it)  L  is  continuous  in  H 

(tii)  L  is  continuous  at  one  point  in  H. 

Al.l  Theorem  (Rieaz  representation) 

If  LzH* ,  then  there  exists  a  unique  fi^-H  such  that 

moreover,  the  vector  space  H*  becomes  a  Hilbert  space  with  inner  product 
<L,G>fi»  —  <f  i,f  q>h  VL,  G&H*. 


Al.l  Definition 

A  Hilbert  space  H(  U )  of  functions  defined  on  a  set  U cK1  is  said  to  be  a  repro¬ 
ducing  kernel  Hilbert  space  (RKHS)  if  there  exists  a  reproducing  kernel  K{-,  ) 
functional  defined  on  UxU  such  that 

(i)  K(-,t)zH{U)  V  f  e£7 

and 

(ix)  /(f)  =  </,*(•/)>  V  /  e/f ( U)  and  V  f  e  £/. 

A1.2  Proposition 

H(U)  is  a  RKHS  if  and  only  if  for  all  t e  U  the  point  evaluation  at  t  is  con¬ 
tinuous,  that  is,  if  and  only  if  there  exists  Q  such  that 

1/(01*  Qlf/ll  V /£#(£/). 


A1.3  Proportion 


Let  L  be  a  continuous  linear  functional  defined  in  the  reproducing  kernel 
Hilbert  space  H(U),  and  consider  the  kernel  K(s ,t),  as  a  function  of  t,  then 
LK(s.t)  where  ij  is  the  Riesz  representer  of  L. 

Convexity  and  dlflerantial  characterizations 

A1.2  Definition 

Let  5  be  a  closed  convex  subset  of  H  and  consider  L :  H-* R  then 
(i)  L  is  convex  on  5  if 

tLf  +  (i-t)Lg  *  L(tf +(l-t)g)  Vf  e[0,l]  and  v/,  geS. 

(ti)  L  is  strictly  convex  in  S  if 
tLf  f  (l-t)Lg  >L(t/m-t)g)  Vfe(O.l)  and vf,gs.S,f*g. 

A1.3  Definition 

Let  L:  H-* R  Given  U\,  .  .  .  ,  Un,  and  /  €.H,  by  the  n01  Gateaux  variation  of 
L  at  /  in  the  directions  u1(  .  .  .  ,  a*  is  given  by 

i(">/(oi . “»)  =  Ugy|i!"_1)(/+4“n)(Ul . On-l) 

-t<— ’>(/)(«, . U.-,) 

with  L^f  —  Lf.  If  L^f  €  //*,  the  Riesz  representer  of  L^f  is  denoted  by 
VL/  and  is  called  the  gradient  of  L  at  /  ,  that  is, 

LW(J  )(u)  a  <VLf  ,u>  V  u€H. 

Al.  4  Proportion 


If  *(i)  =  L(J  +f  m)  then  *<n>(0)  =  L^>(/ )(« . u). 


Al.  4  Definition 


Let  S  be  a  convex  subset  of  H .  By  the  cone  tangent  to  S  at  /  we  mean 

Q(/ )  =  ueH  :  3 1  >0,  such  that  f  +t  u  e  5  . 

Suppose  that  L  is  twice  Gateaux  differentiable  in  S,  then  L"  Is  said  to  be  posi¬ 
tive  semideflnite  relative  to  5  if  for  each  /  eS  we  have 

L" (/)(«,«)»  0  VaeQ(/). 

We  say  that  L"  is  positive  definite  relative  to  5  if 

L”  (/)(«, u)  >  0  Vaen(/),  q*0. 

A1.5  Proposition 

Assume  that  L:  is  twice  Gateaux  differentiable  in  a  cloSted  convex 

subset  5  of  H.  Then: 

(i)  L  is  convex  in  5  V  is  positive  semideflnite  relative  to  5. 

(ii)  L  is  strictly  convex  in  S  if  L"  is  positive  definite  relative  to  S. 

Optimisation  problems  in  Hilbert  space 

We  now  consider  the  existence  and  uniqueness  of  a  solution  to 

Problem  Al.l 

Minimize  Lf  subject  to  /  e  S . 

Al.B  Proposition 

If  L  is  strictly  convex  and  S  is  closed  and  convex,  then  problem  Al  l  has  at 


most  one  solution. 
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A1.7  Proportion 

Let  S  be  a  closed  convex  subset  of  H.  If  L  \  5-*R  is  convex  in  S.  continuous 
in  S  and  if  {/njc5  and  !|/n||-»<»  then,  problem  Al.l  has  at 

least  one  solution 
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|  APPENDIX  A2 

DOCUMENTATION  FOR  DSCOMP  AND  DSEVAL 

\ 

i 


In  this  appendix  we  list  the  documentation  for  routines  DSCOMP  which  com¬ 
putes  the  spline  with  linear  constraints  and  DSEVAL  which  evaluates  the  spline 
at  some  set  of  points  provided  by  the  user.  We  also  list  here  all  the  routines 
needed  to  compute  the  spline,  except  the  routines  to  solve  the  quadratic  pro¬ 
gramming  problem.  These  routines  are  coded  in  Ratfor  (see  Kemighan  and 
Plauger.  1976). 
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******************************************************************* 

*  DSC  CMP  * 

******************************************************************* 

subroutine  dscomp(x, ldx, dim, nobs,s, Ids, neons, z,m,bl,bu,isigma, 

sigma, bigeig, zero, finity,  indlam, lamvec,nvalle, 
nval  ri  ,nval  am ,  i  save ,  i  sta  rt ,  i  state  ,nact  iv ,  i  tmax , 
ioptv ,  i  qpout ,  iactch ,  wmach , powers ,  ldp,bigm  ,coef , 
hhat, vo flan, the tam, iwork,liwork, work, lwork , ierr) 

double  precision  x(ldx,l) ,s(lds,l) ,z(l) ,£inity,zero,bigeig,sigma(l) , 
bl (1) ,bu(l) ,coef (1) ,hhat(l) ,  wmach (15) ,voflam(l) , 
bigeig ,  lamvec  (nvalle+nvalr  i+1 ) ,  the  tam,  work  ( 1 ) 

nteger  ldx,lds,m,dim,istate(l) , nobs, neons, lwork, istart,i sigma, 
liwork,iwork(l)  ,itmax,nvalam,istate(l) ,nactiv,isave, 
ioptv, nvalle,nvalri ,iactch(l) , indlam, powers (ldp, 1) ,ldp,bigm 

implicit  double  precision  (a-h,o-z) 

This  routine  computes  a  thin  plate  spline  with  linear  constraints 
as  the  solution  to  the  following  problem 

minimize: 

nobs  2  2 

SUM  (z(i)  -  h (x( i) )  /(sigma(i) )  +  lambda  *  J  (h) 

i-1  m 


subject  to 


bl(l) 

bl(2) 


.le. 

.le. 


h(s(l) ) 
h(s(2)) 


.le.  bud) 
.le.  bu(2) 


ldx 


bl(ncons)  .le.  h(s(ncons))  .le.  bu(ncons) 


Double (ldx, nobs) . 

Array  containing  the  abscissas  of  the  points  at 
which  observations  are.  The  rows  of  this  matrix 
correspond  to  the  dimension  of  the  space,  and 
the  col mms  to  the  number  of  observations. 

For  example  x(2,100)  is  the  second  coordinate 
of  the  100th  point. 

Integer  constant  or  variable. 

On  input  ldx  must  contain  the  leading  dimension 
of  array  x  as  defined  in  the  dimension  statement 
in  the  main  program,  ldx  should  be  greater  than 
or  equal  than  the  dimension  of  the  space. 


i 

■ p 
» 


u.v-V-1 


din  Integer  constant  or  variable 
Dimension  of  the  space. 


nobs 

s 

Ids 

neons 

z 

m 

bl 

bu 

isigma 


Integer  constant  or  variable. 

Contains  the  number  of  observations  for  this  problem. 
Double  (Ids, neons) 

Array  containing  the  points  at  which  the  linear 
constraints  are  enforced.  For  example,  s(2,100) 
denotes  the  second  coordinate  of  the  50th  point 
where  constraints  are  enforced. 

Integer  variable  or  constant. 

Leading  dimension  of  the  array  s.  Ids  should  be 
greater  than  or  equal  to  the  dimension  of 
the  space. 

Integer  variable  or  constant. 

Number  of  linear  constraints  being  enforced. 

Double  (nobs) . 

Vector  of  observations  at  the  points  in  x. 

Integer  variable  or  constant. 

Degree  of  smoothing  for  the  spline.  If  the 
dimension  of  the  space,dim  is  smaller  than  4 
then  2  .le.  m  .le.  7.  If  dimp  4  then  m  »  3,  4 
or  5.  If  dim"  5,  m  »  4  or  5.  If  dim^  6,  m-4. 

Double  (nobs  +  2*ncons) . 

Vector  containing  the  lower  bounds  for  the 
constrained  spline  in  the  last  neons  positions. 

The  first  nobs+ncons  positions  are  used  by  the 
quadratic  programing  routine. 

Double  (nobs  +  2*ncons) . 

Vector  containing  the  upper  bounds  for  the 
constrained  spline  in  the  last  neons  positions. 

The  first  nobs+ncons  positions  are  used  by  the 
quadratic  programing  routine. 

Integer  variable  or  constant. 

Indicates  whether  or  not  relative  weights  for 
approximate  errors  on  the  observed  values  are 
to  be  specified.  See  sigma  below. 

isigma  *  1  -  relative  weigths  are  not  specified 
and  assumed  to  be  all  equal  to  1. 
isigma  ■  0  -  relative  weithts  are  specified  in 
sigma. 


If  isigma  different  from  1  and  0,  isigma  -  1  is 
assumed. 

sigma  Double  (nobs) .  * 

Array  of  size  at  least  nobs  specifying  the  relative 
weights  for  approximate  errors  on  the  observed 
values.  If  sigma  *  1  then  sigma  is  not  used  and 
need  not  be  dimensioned  in  the  calling  program. 

bigeig  Double  precision  variable. 

If  lambda  for  unconstrained  problem  is  infinity, 
bigeig  should  contain  the  largest  eigenvalue 
obtained  when  the  unconstrained  problem  was  solved 
If  lambda  for  the  unconstrained  problem  is  finite 
bigeig  is  not  referenced. 

indlam  Integer  variable. 

On  return,  indlam  will  contain  a  number  such 
that  lamvec( indlam)  ■  log  of  the  estimate  of  the 
■soothing  parameter  lambda. 

lamvec  Doifcle  (nvalle  +  rival  ri  +  1) . 

On  entry:  lamvec  (1)  should  contain  the  estimate  of 
lambda  that  minimizes  the  generalized 
cross-validation  function  for  the  uncons¬ 
trained  problem.  If  this  value  is  infinity 
then  lamvec (1)  should  be  set  to  -l.dO. 

On  return: lamvec  will  contain  a  grid  of  values  in 

units  of  log  of  lambda  where  the  generalized 
cross-validation  function  for  constrained 
problems  was  evaluated.  Also,  lamvec  (indlam) 
will  contain  the  log  of  the  value  of  lambda 
that  minimized  the  OCVC  function. 

nvalle  Integer  variable. 

On  entry  nvalle  should  contain  the  number  of  values 
■nailer  than  lamvec (1)  at  which  OC\C  is  to  be  evaluated. 

nvalrl  Integer  variable. 

On  entry  rival  ri  shoud  contain  the  nuaber  of  values 
larger  than  lamvec (1)  at  which  is  to  be  evaluated. 

If  both  nvalle  and  nvalri  are  less  than  or  equal  to 
zero  then  nvalri  and  nvalle  are  set  to  15. 

If  lamvec (1)  -  O.dO  then  nvalle  should  be  equal  to  zero. 
If  lamvec(l)  ■  -l.dO,  that  is  the  estimate  of  lambda 
from  the  unconstrained  problem  is 
infinity,  then  nvalri  should  be  zero. 


rival  am  Integer  variable. 

On  return  nvalam  will  contain  the  number  of  values  of 
lambda  at  which  V(lambda)  was  evaluated. 

isave  Integer  variable  or  constant. 

Specifies  whether  or  not  various  intermediate  results 
computed  by  dscomp  are  to  be  saved  or  restored  from 
file.  If  two  or  more  problems  are  to  be  solved  with 
the  same  dim,  m,  x,  s,  and  sigma,  i.  e.,  changes  only 
in  z,  some  of  the  computations  can  be  bypass  for 
the  second  and  subsequent  problems  by  savino  (termed iate 
results  computed  for  the  first  problem. 

isave  ■  1  -  Do  not  use  any  intermediate  resu  saved 
and  save  results  computed  on  fil 

isave  *-l  -  Use  intermediate  results  saved  oi.  **le. 

isave  different  from  -1  and  1  -  Do  not  use  any  intermi- 
diate  results  saved  and  do  not  save  any 
results  computed  on  file. 

i start  Integer  variable  or  constant. 

Indicates  whether  or  not  an  initial  guess  for  the  points 
at  which  the  the  constraints  will  be  active  for  the 
value  of  lambda  that  minimizes  QCTC  will  be  given. 

istart  ■  1  -  Guess  for  set  of  active  constraints  is 

specified  in  the  vector  istate  (see  bellow) . 

istart  different  from  1  -  No  initial  guess  is  given. 

It  is  recommended  to  use  istart  ■  1  and  use  the  set  of 
points  where  the  constraints  are  most  seriously  violated 
as  the  initial  guess  (see  istate) . 

istate  Integer  (nobs  +  2*ncons) . 

On  entry:  The  last  neons  positions  of  this  array 

indicate  the  initial  guess  for  the  set  of 
active  constraints. 

On  retum:The  last  neons  positions  of  this  array  indicate 
the  set  of  active  constraints  at  the  solution. 

istate  (nobs+i)  *  0  -  ith.  linear  constraint  not  active. 

Istate  (nobs+i)  ■  1  -  ith.  linear  constraint  active  at 

its  lower  bound. 

istate  (nobs+i)  ■  2  -  ith.  linear  constraint  active  at 

its  upper  bound. 


nactiv  Integer  variable. 

On  entry:  The  number  of  active  constraints  in  the  initial 
guess  (if  istart*l). 

On  return:  The  number  of  active  constraints  at  the 
solution. 

itmax  Integer  variable. 

Maximum  number  of  iterations  for  the  quadratic 
programming  problem.  If  itmax  less  than  or  equal  zero 
then  itmax  is  set  to  100. 

ioptv  Integer  variable  or  constant. 

Indicates  whether  or  not  optimization  of  the  generalized 
cross-validation  function  for  constrained  problems  is 
desired. 

ioptv  ■  1  -  Optimization  of  V(lambda)  is  desired. 

ioptv  .ne.  1  -  Optimization  of  V(lambda)  not  desired. 

iqpout  Integer  variable  or  constant. 

Controls  printing  for  quadratic  programming  routine, 
iqpout  »  0  -  Nothing  is  printed  by  qpfc. 

*  2  -  One  line  of  output  for  each  quadratic 
programming  problem  solved. 

>10  -  Almost  all  relevant  information  is 
printed  for  debugging  purposes. 

iactch  Integer  (nvalle  +  nvalri  +  1). 

If  optimization  of  V(lambda)  is  requested  then  this 
array  indicates  for  each  value  of  lambda  tried  whether 
or  not  the  set  of  active  constraints  changed  and  whether 
or  not  the  quadratic  programming  problem  for  each  value 
of  lambda  had  a  solution.  If  for  a  specific  value  of 
lambda,  say,  lamvec(j)  tfri  quadratic  programming  problem 
did  not  have  a  solution  then  iactch  is  set  to: 

inform  »  2  -  The  problem  appears  to  be  infeasible. 

inform  ■  4  -  Too  many  iterations  have  been  perfomed. 

inform  *  5  -  Too  many  iterations  have  been  performed 
without  changing  the  solution.  The 
degeneracy  is  unresolved. 

inform  *  6  -  An  active  constraint  has  become  infeasible 
The  constraints  are  likely  to  be  very  badly- 
scaled.  Try  a  different  starting  point. 


If  lamvec  (j)  is  not  any  of  the  above  nunbers  then 
laravec  (j)  will  be  either  zero  or  one.  Vhen,  say 
lamvec  (j)  ■  1  and  laravec  ( j+1)  *1,  this  means  that 
changing  from  the  jth.  value  of  lambda  to  the  next  one 
did  not  change  the  set  of  active  constraints.  If 
lamvec (j)-l  and  lamvec( j+l)*Q,  then  the  active  set  of 
constraints  changed . 

wmach  Double  (15) . 

Machine  dependent  constants  required  by  the  quadratic 
programing  routine  qpfc. 

vmach(l)  -  Base  of  floating  point  arithmetic 
wmach(2)  -  No.  of  base  wmach (1)  digits  of  precision, 
wmach  (3)  -  Floating  point  precision, 

wmach (4)  -  sqrt(wmach(3)) . 

wmach(5)  -  Smallest  positive  floating  point  number, 
wmach (6)  -  Sqrt(wmach(5) ) . 

wmach (7)  -  Largest  positive  floating  point  number. 

wmach(9)  -  Sqrt  (vmach(7)). 

wmach (10)  -  Standard  file  number  for  the  input  stream, 
wmach (11)  -  Standard  file  number  for  the  output  stream. 

powers  Integer  (ldp, bigm) . 

fti  return  powers (i,j)  contains  the  power  at  which  the 
ith.  component  in  dim-space  is  raised,  corresponding  to 
the  jth.  d-ooefficient. 

For  example,  if  m"3  and  d*2,  powers  would  be  given  by: 

0  0  0  1  1  2 
0  12  0  10 


ldp  Integer  variable  or  constant. 

Specifies  leading  dimension  of  the  array  powers  as 
defined  in  the  dimension  statement  of  the  calling 
program,  ldp  should  be  larger  than  or  equal  to  dim. 

bigm  Integer  variable. 

On  return  bigm  wrill  contain  the  number  of 
d-coefficients  of  the  spline. 

(dim  +  m  -1) J 

bigm  *  — -  . 

(dim!*  (m-1)  I) 

coef  Double  (nobs  +  neons  +  bigm) . 

On  return:  coef  vd.ll  contain  the  coefficients  of  the 
spline.  The  coefficients  will  be  arranged 
as  follows: 

I  c  | 
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coef  *  |  b  I 
1  d  I 

where  c  is  an  nobs  dimensional  vector  that 
will  contain  the  coefficients  of  the  spline 
corresponding  to  the  points  where  the  obser¬ 
vations  were  taken,  b  is  an  neons  dimensional 
vector  containing  the  coefficients  of  the 
spline  corresponding  to  the  points  where  the 
constraints  are  enforced,  d  is  a  bigm 
dimensional  vector  containing  the  coefficients 
of  the  polynomial  part  of  the  spline. 

hhat  Double  (nobs) . 

On  return,  hhat  will  contain  the  estimate  of  the  function 
h  at  the  sample  points  x. 

vo flam  Double  (nvalle+nvalri+1) . 

If  optimization  of  V(lambda)  was  requested  voflam 
will  contain  the  value  of  V(lamda)  at  the  values  of 
lambda  in  lamvec. 

A  negative  value  of  voflam (j)  means  that  the  quadratic 
problem  could  not  be  solved  for  the  lambda- 1  amv ec (j) , 
see  also  iactch. 

thetam  Double  precision  variable. 

On  return  thetam  will  contain  the  constant  that 
multiplies  E  . 

m 

iwork  Integer  (liwork) . 

Integer  work  array  of  dimension  at  least  liwork. 

liwork  Integer  variable. 

On  entry  iwork  specifies  the  dimension  of  the  array 
iwork.  liwork  should  be  greater  than  or  equal  to 
n total  +  2*ncons. 

work  Double  (Iwork) . 

Double  precision  work  array  of  dimension  Iwork. 

Iwork  Integer  variable  or  constant. 

On  entry  Iwork  specifies  the  dimension  of  work. 

Iwork  .ge.  n total* (2*ntotal+2*bignH-nobs+ncons+2)+ 
nmbigm*  (nrobignH-2 )  +tnax  imum  ( nnnl ,  nnn2 ) 

where  ntotal-nobs+ncons 
nmbigm-ntotal-bigm 
nnnl^itotal*  (ntotal+ncons+10)  + 
neons* (ncons+5)  +  2 


nnn2«ntotal* (3*ntotal+l) 


ierr  Integer  variable. 

On  return  ierr  will  contain  an  error  code. 

ierr*0  -  No  errors  occurred. 

ierr>0  -  Error (s)  in  input  parameters  occured. 
ierr  is  a  number  of  the  form 
abcdefg,  where  each  digit  is  either 
zero  [no  error  occured]  or  one 
[some  parameter (s)  value(s)  is  (are) 
invalid]#  as  follows: 

a>l  -  Invalid  combination  of  m  and  dim. 

b-1  -  Neons  <  1. 

c*l  -  nobe+ncons-bigm  <  1. 

dHl  -  Either  ldx<dim  or  lds<dim  or  ldp<dim. 

e*l  -  sigma  (i)  .le.  0  for  some  i. 

f-1  -  lwork  smaller  than  required. 

g-1  -  liwork  smaller  than  required. 

ierr<0  -  Errors  in  the  solution  of  the  problem 
independent  of  the  quadratic  programming 
problem  occured: 

ierr*-10  -  Values  for  nobs, neons, bigm  in 
first  record  of  matrix-file  do 
not  agree  with  current  values. 

ierr»-ll  -  Insufficient  data  in  matrix-file 

ierr»-20  -  Matrix  IK)  '  *  E  (1)  *  Q 

2  2 

is  not  positive  definite.  This 
may  be  caused  by  replicates  in 
the  original  data  set. 

err*0 

all  dsbigm(dim,m,bigm) 

compute  thetam  . 

f  (bigm  1*  0)  call  dsteta(dim,mrthetam) 
ntotal-nobs+ncons 
nmbign^n  total -bigm 

* 


allocate  work  storage 


n2-ntotal*bignH-nl 

n3-ntotal*bigm+n2 

n4*big»fn3 

n5^itotal*ntotal+n4 

n6-bigmfn5 

n7^ntotal*ntotal+n6 

n8^mbigm*nnbigmfn7 

n9-nobs*ntotal+n8 

nl0-ocons*ntotal+n9 

nll^itotal+nlO 

nl2-ntotal+nll 

nrml-ntotal* (ntotal+ncons+1 0) +ncons* (ncons+5)+2 
nnn2-ntotal*  (3*ntotal+l) 
lwk-maxO  (nnnl, nrm2) 
liwk-ntotal+2*ncons 

# 

* 

t  minimum  dimension  of  work 

* 

mirmk-nl2+lwk 

iflag-0 

if  (Isigma  1«  0)  1  sigma-1 
if  (isigma  1-  1)  ( 
do  i— 1 rnobs 

if(sigma(i)  <-  O.dO) 
i flag-1 

} 

if  (liwork  <  lit*)  ierr-1 
if  (lwork  <  minwk)  ierr»ierr+10 
if  (iflag  —  1)  ierr-ierr+100 

if  (ldx  <  dim  I  Ids  <  dim  I  ldp  <  dim)  ierr-ierr+1000 
if  (rmbigm  <-  0)  ierr-ierr+10000 
if  (neons  <  1)  ierr-ierr+1 00000 
if  (bigm  —  0)  ierr«ierr+1000000 

if  (ierr  >  0)  return 
if  (itmaoc  <-  0)  itmax-100 
if  (ioptv.  I-  1)  ioptv-0 
if  (labs(isave)  1-  1)  i save-0 
if  (iscart  !-  1)  istart  -  0 
if  (nvalri+nvalle  <«  0)  { 
if  (lamvec(l)  <  -0.5d0) { 
nvalri-0 
nvalle-30 

} 

else  ( 

if  (lamvec(l)  <-  O.dO)  { 
nvalri-30 
nvalle-0 

} 

else  { 

nvalri-15 


nvalle-15 

} 

} 

} 

« 

#  set  machine  constants 

» 

call  dsmach(wnach) 
zero-wmach(5) 
finity*uach(9) 
nvalamrovalle+nvalri+1 

call  dscom2(x,ldx,dim,nob6,s,lds,ncons,z,m,bl,bu,isigma,sigma, 
bigeig, zero, f  ini  ty,  Indian, lanvec,nvalle,nvalri, 
nvalam , i save , i start , i state , nact iv , i tmax , ioptv , i qpout 
iactch  ,«mach  , powers  ,ldp  ,nact  iv  ,coef  ,hhat , 
vofian,thetan,ntotal,nmbigm,bigm,work(nl)  ,work(n2) , 
work(n3) ,work(n4) ,work(n5) ,work(n6) ,work(n7) , 
work(n8) ,work(n9) ,work(nlO) ,work(nll) ,work(nl2) , 
lwk,iwork,liwork,ierr) 

ierr*-ierr 

return 


************************************************************1.  ****** 

*  DSETVAL  * 

******************************************************************* 

subroutine  dseval(x,ldx,dim,m, nobs, s,lds, neons, xtab,ldxtab, 
ntab,coef , powers, ldp,hofxta , ierr) 

double  precision  x(ldx,l) ,s(lds,l) ,xtab(ldxtab,l) ,coef(l) , 
hofxta(l) ,p(6) ,hxta 

nteger  dim,nobs, ldx, Ids, ldxtab,ntab, powers(ldp, 1) , ierr, m,bigm 

This  routine  evaluates  the  spline  h,  computed  by  dscomp 
at  the  set  ntab  points  in  xtab.  The  resulting  values  of 
the  spline  are  put  in  hofxta. 

x  Double (ldx, nobs) . 

Array  containing  the  abscissas  of  the  points  at 
which  observations  are.  The  rows  of  this  matrix 
correspond  to  the  dimension  of  the  space,  and  the 
columns  to  the  number  of  observations.  For  example 
x(2,9)  is  the  second  coordinate  of  the  9th  point. 

ldx  Integer  constant  or  variable. 

On  input  ldx  must  contain  the  leading  dimension  of 
array  x  as  defined  in  the  dimension  statement  in 
the  main  program,  ldx  should  be  greater  than  or 
equal  to  the  dimension  of  the  space. 

dim  Integer  constant  or  variable. 

Dimension  of  the  space. 

nobs  Integer  constant  or  variable. 

Contains  the  number  of  observations  for  this  problem. 

s  Double  (ids, neons) 

Array  containing  the  points  at  which  the  linear 
constraints  are  enforced.  For  example,  s(2,100) 
denotes  the  second  coordinate  of  the  50th  point 
where  constraints  are  enforced. 

Ids  Integer  variable  or  constant. 

Leading  dimension  of  the  array  s.  Ids  should 
be  greater  than  or  equal  to  the  dimension  of 
the  space. 

neons  Integer  variable  or  constant. 

Number  of  linear  constraints  being  enforced. 

Integer  variable  or  constant. 

Degree  of  smoothing  for  the  spline.  If  the 
dimension  of  the  space, dimis  smaller  than  4 
then  2  .le.  m  .le.  7.  Ifdim-  4  then  m  * 


m 


3,  4  or  5.  Ifdim*  5,  m  ■  4  or  5.  If dim-  6 
m  »  4. 


powers  Integer  (ldp, bigm) . 

On  entry  powers (i,j)  should  contain  the  power  at  which 
the  ith.  component  in  dim- space  is  raised,  corresponding 
to  the  jth.  d-coefficient.  This  matrix  is  computed  by 
dscomp. 

ldp  Integer  variable  or  constant. 

Specifies  leading  dimension  of  the  array  powers  as 
defined  in  the  dimension  statement  of  the  calling 
program,  ldp  should  be.  larger  than  or  equal  to  dim. 

bigm  Integer  variable. 

On  return  bigm  will  contain  the  number  of  d-coefficients 
of  the  spline. 

(dim  +  m  -1)  1 

bigm  - -  . 

(dimi*(m-l)!) 

coef  Double  (nobs  +  neons  +  bigm) . 

On  entry  coef  should  contain  the  coefficients  of  the 
spline  as  obtained  by  dscomp.  The  coefficients  should 
be  arranged  as  follows: 

I  c  | 
coef  ■  |  b  I 
I  d  I 

where  c  is  an  nobs  dimensional  vector  that 
will  contain  the  coefficients  of  the  spline 
corresponding  to  the  points  where  the  obser¬ 
vations  were  taken,  b  is  an  neons  dimensional 
vector  containing  the  coefficients  of  the 
spline  corresponding  to  the  points  where  the 
constraints  are  enforced,  d  is  a  bigm 
dimensional  vector  containing  the 
coefficientsof  the  polynomial  part  of  the 
spline. 

hofxta  Double  (ntab) . 

On  return,  hofxta  will  contain  the  estimate  of  the 
function  h  at  the  ntab  points  in  xtab. 

thetam  Double  precision  variable. 

On  return  thetam  will  contain  the  constant  that 
multiplies  E  . 


Integer  variable. 

On  return  ierr  will  contain  an  error  code. 

ierr»0  -  No  errors  occurred. 

ierr>0  -  Error (s)  in  input  parameters  occur ed. 
ierr  is  a  nunber  of  the  form 
abc,  "here  each  digit  is  either 
zero  tno  error  occured]  or  one 
[some  perameter(s)  value (s)  is  (are) 
invalid] ,  as  follows: 

a-1  -  Either  ldx<dia  or  lds<dim  or  ldp<dim 
or  ldxtab  <  dim. 
b*l  -  nobs+ncons-bigm  <  1. 
c-1  -  Invalid  combination  of  m  and  dim. 


#  Get  bigm 

* 

call  dsbigm(dim,m,bigm) 

* 

#  Get  thetam 

* 

if  (bigm  1*  0)  call  dsteta (dim ,m, thetam) 
nto  talfioba  i  neons 
nmbigmpnto  tal-bigm 

if  (ldxCdim  |  lds<dim  I  ldxtabCdim  I  ldp<dim)  ierr  ■  1 
if  (nmbigm  <■  0)  ierr  ■  ierr+10 
if  (bigm  0)  ierr»ierr+100 
if  (ierr  1*  0)  return 
do  j«l,ntab  { 
do  i"l,dim{ 

p(l)*xtab(i,j) 

) 

#  Evaluate  spline  at  point  p 

call  dshofp(p,x,ldx,dim, nobs, s,lds,ncefis,ntotal,m, thetam, 
bigm,powers,ldp,coef  ,hxta) 
hofxta( j)-hxta 


return 
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******************************************************************* 

*  DSBICM  * 

******************************************************************* 


subroutine  dsbigm  (dim,  m,  bigm) 


get  bigm  for  dim,m  if  legal,  if  illegal,  bigm 
is  set  to  zero,  legal  values  of  dim  and  m  are: 


m  dim  1 


yes  yes  yes  no  no  no 

yes  yes  yes  yes  yes  no 

yes  yes  yes  yes  yes  yes 

yes  yes  yes  yes  no  no 

yes  yes  yes  no  no  no 

yes  yes  yes  no  no  no 


implicit  integer  (a-z) 

dimension  mtab(36) 

data  mtab  /  2,  3,  4,  5,  6,  7, 

3,  6,  10,  15,  21,  28, 

4,  10,  20,  35,  56,  84, 

0,  15,  35,  70,  0,  0, 

0,  21,  56,  0,  0,  0, 

0,  0,  84,  0,  0,  0  / 

* 

if  (dim  .It.  1  .or.  dim  .gt.  6}  go  to  13 
if  (m  .It.  2  .or.  m  .gt.  7)  go  to  13 
1  ■  6*dim  +  m  -  7 
bigm  ■  mtab(l) 
return 

#  error  return 
13  bigm  *  0 
return 
end 


******************************************************************* 

*  DSCBD  * 

******************************************************************* 


subroutine  dscbd  (ntotal  ,nobs  ,nmbigm ,bigtn,coef  ,tsigqr , 
qr  tsi  a  ,work ,  i  sigma ,  s  igma) 

double  precision  coef (ntotal) ,tsigqr (ntotal, bigm) ,qrtsia(bigm) , 
work(l) ,sigma(nobs) 

integer  ntotal, nobs, nmbigm, bigm, isigma 

t 

t  Obtain  vector  of  coefficients  of  the  spline 

» 

#  I  c  | 


#  Recall  that  the  vector  coe£  contains  the  solution 

#  to  the  lower  dimensional  problem  (n total  -  bigm) 

#  Zeroes  to  first  bigm  pos.  of  work 

do  i*l,bigm 
work(i)«0.d0 

#  Copy  coef  into  work(bignH-l) 
call  dcopy (ntotal, coef,  1, work  (bigmfl)  ,1) 

#  get 

#  I  c  | 

#  I  I  ■  Q  *  coef  -  |Q  Q  |*work 

#  I  b  |  2  112  1 

job  »  10000 

call  dqrsl ( tsigqr , ntotal ,ntotal ,bigm ,qr tsia , work , coef , 
dmy,dkny,dmy,dmy,  job ,  info) 

#  Copy  last  bigm  elements  of  work  into 

#  last  elements  of  coef 

call  dcopy ( bigm ,%*>  rk  ( nto tal+l )  ,l,coef  (ntotal+1)  ,1) 

#  Rescale  by  sigma  if  signal's  were 

I  read 

if  (isigma  ■■  1) 
return 

else  { 

do  i»l,nobs 

coef (i)acoef ( i) /sigma (i) 

} 

return 

end 


******************************************************************* 

*  DSCOLE  * 

******************************************************************* 

subroutine  dscole( j,cje,ntotal,x,ldx,nobs,s,lds,ncons,ra,pi,pj, 
dim,thetam) 

double  precision  x(ldx,l) ,s(lds,l) ,pi (dim) ,pj (dim) ,thetam,cje(l) 
integer  j, ntotal, ldx. Ids, dim 

I 

t  Compute  lower  diagonal  elements  of  the  jth  column  of  E 


#  and  put  in  vector  cje. 

# 

#  If  j  <»  nobs  retrieve  jth  sample  point 

#  else  retrieve  the  {j-nobs)  grid  point  in  s 

# 

if  (j  <*  nobs)  { 
do  i-l,dim 
pj(i)«x(i,j) 

} 

else  { 

do  i»l,dim 

pj(i)«s(i,  j-nobs) 

} 

if  ( j<*nobs) 

jl-1 

else 

jl*j-nobs 
do  i* j ,nobs  { 
do  ilal,dim 

pi(il)-x(il,i) 

#  Obtain  E(pi,pj) 

call  dseij(pi,pj,dim,m,cje(i)) 
cje(i)»cje(i)*thetam 

} 

do  i*jl, neons  { 
do  i 1*1, dim 

pi(il)-s(il,i) 

il»i+nobs 

call  dseij  (pi,pj,dim,m,cje(il)) 
cje(il)«cje(il)  *thetan 

} 

return 

end 


subroutine  dacom2(x,ldx,dira,nobs,s, Ids, neons, z,m,bl,bu,isigma,sigma 
bigeig , zero , f ini ty, Indian, lamvec ,nvalle ,nvalri , 
nval  am , i save , i start , i state , nact iv , i tmax , i optv , i qpout , 
i actch ,*mach , powers , ldp , nactiv ,coef ,hhat , 
vofl  am,  thetam,n  total  ,nmbigm,bigm,tsigma,tsigqr , 
qrtsia ,esigma  ,qrtsil ,hessl ,hess2,  qpl ima,qpcons,qpl ive, 
eigen,work,lwork,iv©rk,livork,ierr) 

# 

double  precision  x(ldx, nobs) ,s(lds, neons) ,finity, zero, bigeig, 

sigma(l) ,bl(l) ,bu(l) ,coef (1) ,hhat(l) ,  voflam(l) , 
thetam,tsigma(ntotal,l) ,tsigqr(ntotal,l) ,qrtsia(l) 
esigma(ntotal,l) ,qrtsil(l) ,hessl  (ntotal,!) , 
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hess2(nmbigm,l)  ,qplima(nobs,l) , 
qpcons( neons, 1) ,qplive(l) ,eigen(l) , 
work(l) 

I 

integer  ldx,  Ids, m,dim,i  state  (1) , nobs, neons, Iwork, 

1 iwork, iwork (1) ,itmax,i start, i sigma, i save, 
ioptv ,nvall e , nval r i , i state ( 1 ) ,nact i v , 
powers(ldp,l) ,ldp,bigm,iactch(l) , 

Indian,  n  total  ,nmbigm 

* 

logical  start 


Second  level  routine  to  compute  a  thin  plate  spline  with  linear 
constraints  as  the  solution  to  the  following  problem 

minimize: 

nobs  2  2 

SIM  <z(i)  -  h(x(i))  /(sigma(i) )  +  lambda  *  J  (h) 

i*l  m 

subject  to 

bl(l)  .le.  h(s(l))  .le.  bud) 

bl(2)  .le.  h(s(2))  .le.  bu(2) 


e 

bl (neons)  .le.  h(s (neons))  .le.  bu( neons) 


For  a  description  of  the  variables  and  arrays  see  comments  in 
dscomp.r 


#  Get  powers  of  polynomials 

# 

nt-ntotal 

nmbmnbigm 

nprinl«minO  (20,ntotal) 
nprin2*min0 (10, nobs) 
nprin3^nin0 (10, neons) 
call  dspoly  (dim,m,powers,ldp) 

# 

#  Scale  z  data  by  1/sigma  if  necesary 


f  (isigma  ”  0)  call  dszdis  (z, nobs, sigma) 
i  If  matrices  previously  saved  read  them. 


skip  computations  and  go  to  100 


t 

» 

if  (isave  —  -1)  { 

open  (lffile^'splmatrix' , form" “unformatted') 
rewind  (1) 

call  dsrdl  (ntotal, nobs, neons, bigm,dim,ranbigm,hessl,hess2,qpliina, 
qpcons ,tsigqr ,qrtsia ,esigma , ierr) 
if  (ierr  l»  0)  return 
go  to  100 

} 

#  Form  tsigma 

call  dstsig  ( x ,  ldx  ,s ,  Ids , powers ,  ldp ,  tsigma , ntotal  ,bigm , nobs , neons , 
d  im ,  i  sigma ,  s  igma ) 

#  Copy  tsigma  in  tsigqr 

do  j*l,bigm 

call  dcopy  (ntotal, tsigma(l, j) ,1, tsigqr (l,j) ,1) 

#  Form  QR  decomposition  of  tsigma 

job  *  0 

call  dqrdc (tsigqr  ,ntotal  ,ntotal  ,bigm,qrtsia , jpvt ,work , job) 

#  Form  esigma 

call  dsesig( esigma, ntotal , nobs, neons, isigma, sigma, x, ldx, 
s ,  Ids  ,wo  rk ,  the  tarn  ,d  im  ,m) 

#  Form  hessl 

call  dshesl (hessl, ntotal , nmbigm, nobs, tsigqr ,qrtsia, tsigma, bigm, 
esigma, work) 

#  Form  hess2  ■  Q  '  *  E  *  Q 

#  2  2 

call  dsqtaq'tsigqr , ntotal , ntotal ,bigm,qrtsia , esigma, ntotal , 
hess2,nmbigm,work(l)  ,work(ntotal+l) ) 

f  Compute  qpcons  (matrix  of  linear  const.) 

call  d scons (qpcons, neons , ntotal ,nobs , tsigma ,bigm , tsigqr ,qr tsia , 
nmbigm, esigma,  work) 

f  Compute  qplima  (matrix  that  post-multi plies 

#  zsigma  to  obtain  linear  term  in  quadratic 

#  function.  Also,  hhat  is  obtained  by 

I  multiplying  qplima*coef 


t 

call  dsliina(qpliina,nobs,ntotal  ,bigm,nmbigm,tsigma,  tsigqr , 
qrtsia ,esigma ,work) 

* 

#  Write  intermediate  results  if  required 

* 

if  (isave  *■  1)  { 

open  (ljfile-'splmatrix'  ,fonn“' unformatted' ) 
rewind  (1) 

call  dswr 1 1  (nto tal , nobs , neons , b  igm  ,d  im ,  nmbigm  ,hess  1 , 
hess2,qpl ima ,qpcons , tsigqr , qrtsia ,esigma) 

} 

# 

100  continue 

I 

#  Compute  qpl i ve*vecto r  defining  linear  term 

* 

call  dsl ive (qpl ive,ntotal ,z ,nobs,qpl  ima) 

# 

#  Find  coefficients  of  spline  doing  optimization 

#  with  respect  to  lambda  if  necessary 

# 

if  (istart  “1) 
start  »  .true, 
else 

start  *  .false. 

nnnl-ntotal* (ntotal+ncons+9)+ncons* (ncons+3)+l 
nnn2-n total* ( ncons+3 ) +1 
lwk-maxO (nnnl,nnn2) 

1 iwk-nto tal+ncons 

call  dssolv(hessl,ntotal ,hess2,nmbigm,bigm,qplima,qplive,qpcons, neons, 
nobs ,  tsigma  fqrtsi  1 , 1  amvec ,  vofl  am ,  indlam  ,coef ,  ioptv ,  iqpout , 
nvalle,nvalri, rival  am, zero, finity,bl,bu, start,  istate,iv*ork, 
iwork (ncons+1 ) , iactch ,wnach ,bigeig , i  tmax ,hhat ,z ,esigma , 
nactiv ,e igen ,work , i err ) 
if  (ierr  l«  0)  return 


#  obtain  final  coefficients  of  spline 

#  and  put  again  in  vector  coef  in  the 

#  form: 

#  I  c  I 

#  coef  ■  1  b  | 

#  I  d  | 

call  dscbd (ntotal,nobs,nmbigm,bigm, coef , tsigqr, qrtsia, work, isigma, sigma) 

#  Multiply  z  values  by  sigma  if  sigma  was  read 

if  (isigma  n  0)  call  dszmus(z, nobs, sigma) 

return 

end 


******************************************************************* 

*  DSCCNS  * 

******************************************************************* 


subroutine  dscons (qpcons, neons, ntotal ,nobs, 

tsigina  ,bigm ,  tsigqr  ,qr  tsia  ,nmbigm  ,esigma  ,work) 
double  precision  qpcons (neons, ntotal) ,tsigma (ntotal ,bigm) , 
tsigqr (ntotal ,bigm) ,qrtsia (bigm) , 
esigma (ntotal, ntotal)  ,work(l) 
nteger  neons ,nto tal , nobs , bigm , nmbigm 


Compute  matrix  qpcons  which  defines  linear  constraints 


cpcons 


IE  | 
Q  **l  121 
2  IE  | 
I  221 


T  ' 
2 


lob-01000 
2*nmbignH-l 
3-bignH-l 
do  i«l, neons  { 
il»nobs+i 


Compute 


IQ  ’I 
I  1  I 


work 


IQ  'I 
I  2 


I  *  ilst.  col.  of  esigma 


call  dqrsl( tsigqr, ntotal, ntotal, bigm ,qrtsia, 

esigma(l,il)  ,dmy,work(l)  ,dmy,dmy,dmy, 
job, info) 

#  Copy  last  nmbigm  elements  of  work  into 

#  first  nmbigm  columns  of  ith.  row  of  qpcons 

call  dcopy(nnbigm,work(13)  ,1, qpcons (i,l) , neons) 

#  Copy  ilst  row  of  tsigma  into  last 

#  bigm  columns  of  ith  row  of  qpcons 

call  dcopy(bigm,tsigma(il,l) , ntotal, qpcons (i,i 2) , neons) 

} 

return 
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I 

t'l 


******************************************************************* 

*  DSEIGE  * 

******************************************************************* 


subroutine  dseige(ntlmm,a,b, eigen, work, ierr) 

double  precision  a(ntlmm,ntlmm)  ,b(ntlmm,ntln«n)  ,eigen(l)  ,work(l) 

integer  ntlmn,ierr 

t  Solve  generalized  eigenvalue  problem 

#  a*PHI  -BIG  *  b  *  PHI 

#  v  v 

#  to  get  the  neig  positive  eigenvalues  of 

#  -1 

#  b  *a 

ierr*0 

nl-1 

n2-ntlnuH-l 

call  reduc(ntlmm,ntlnn,a,b,work(n2)  ,ierr) 
if  (ierr  ■“»  7*ntlnnrt-l) 

#  b  is  not  positive  definite 
return 

call  tredl(ntlflm,ntlmm,a,eigen,work(nl),work(n2)) 
call  tqlrat(ntlmm,eigen,work(n2) ,ierr) 
if  (ierr  >  0)  { 
iiii-ierr-1 
print  ' 

print  ♦,'******  in  routine  dseige,  only  eigenvalues  1  to  ',iiii 
print  are  correct' 

for  (i»ierrji<«ntlnm;i«i+l) 
eigen (i)-0.0d0 

} 

ierr-0 

return 

end 


******************************************************************* 

*  DSBIJ  * 

******************************************************************* 

subroutine  dseij (pi,pj,dim,m,eij) 
double  precision  pi (dim) ,pj (dim) ,eij 
integer  dim,m 

implicit  double  precision  (a-h,o-z) 

* 

I  Compute 

*  I  |2m-dim  I  ! 

#  e(pi,pj)  *  Ip  -p  I  lnjp  -p  I  if  dim  is  even. 


#  I  I 2m-dim 

#  ■  Ip  -p  I  if  dim  is  odd. 

#  I  i  j! 

# 

go  to  fl, 2, 3, 4, 5, 4), dim 

# 

1  nn^2*m-l 
s«dabs(pi(l)-pj(l)) 
eij»s**mu 

return 

2  mu*m-l 

s-(pi(l)-pj(l))**2  +  (pi  (2)  — pj  (2)) **2 
if  (mu  “  1) 

ei j«0. SdO*dlog (s) *s 
else 

ei j-0. 5d0*dlog  (s)  *s**mu 
return 

3  mu-2*m-3 

s-(pi(l)-pj(l))**2+  (pi  (2)-pj  (2) )  **2+  (pi  (3)-pj  (3) )  **2 
if  (mu  —■  1) 
eij-dsqrt(s) 
else 

eij«dsqrt(s)**mu 

return 

4  nnHW2 
s-(pi(l)-pj(l))**2 
do  i*2,dim 

s-a+(pi(i)-pj(i))**2 
if  (nu  ■■  1) 

ei j»0. 5d0*dlog (s) *s 
else 

ei  j«0. 5d0*dlog  (s)  *s**mu 
return 

5  mu“2*m-5 
s-(pi(l)-pj(l))**2 
do  i«2,dim 

s-»+(pi(i)-pj{i))**2 
if  (mu  “  1) 
eij«dsqrt(s) 
else 

eij*dsqrt(s)  **mu 
return 
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******************************************************************* 

*  DSESIG  .  * 

******************************************************************* 


subroutine  dsesig( esigma, ntotal, nobs, neons, isigma, sigma, 
x ,  ldx ,  s ,  Ids  ,wo  r  k ,  the  tam  ,d  im  ,m) 
implicit  double  precision  (a-h,o-z) 

double  precision  esigma(ntotal,ntotal) ,sigma(nobs) ,x(ldx,l) , 
s(lds,l) ,work(l) , the tan 
integer  ntotal, nobs, neons, isigma, ldx, Ids, dim 


Compute: 


esigma 


-1  -1 
SIGMA  E  SIGMA 
11 


E  SIQ4A 
21 


-1 

SI91A  E 


do  j«l, ntotal  { 


Get  lower  diagonal  elements  of  jth.  column  of  E 


nwl«dinrt-l 

call  dscole(j, esigma (l,j) ,ntotal,x, ldx, nobs, s, Ids, 
neons ,m,work(l)  ,work(dinH-l)  ,dim,thetam) 
if  (isigma  «■  0)  { 
if  (j  <■  nobs) 
jl-nobs+1 
else 

jl*ntotal+l 

for  (i*j;  i<»nobs;  i*i+l)  { 

esigma(i,j)«esigma(i,j)/(sigma(i)*sigma( j) ) 

} 

for  (i«jl;  i<*ntotal;  i-i+l){ 

esigma  ( i  ,j )  -esigma  ( i ,  j )  /sigma  ( j ) 

} 


Copy  lower  diagonal  elements  into  upper  diag. 


do  j *2, ntotal 
do  i»l,j-l 

esigma ( i ,j ) -esigma ( j , i ) 

return 

end 


******************************************************************* 
*  DSGETA  * 


******************************************************************* 

subroutine  dsgeta (a,nobs,ntlmm,ntl ,bigm, tsigma ,qrtsla, workv, workm) 
double  precision  a(ntlHn,ntlmm) ,  tsigma(ntl,bigm)  ,qrtsla(bigm) , 
workv(ntl)  ,workm(ntl,ntl) 
integer  nobs,ntlam,ntl,bigm 

I 

#  Compute  matrix  a  ■  Q  '  *  V  *  Q 
«  2  2 

* 

#  Form  V  put  in  workm 

I 

do  j«l,ntl  { 
do  i«l,ntl  { 

if  (i«-j  &  iOnobs) 
workm (i,j)*1.0d0 

else 

workm  (i,j)*0.0d0 

} 

} 

call  d  sqtaq(  tsigma, ntl,ntl,bigm,qrtsla, workm, ntl, a, ntlmm, 
workv, workm) 

return 

end 


******************************************************************* 

*  D6GETB  * 

******************************************************************* 

subroutine  dsgetb(b,ntotal ,nobs,ntl ,ntlmm,bigm, 

esigma ,  tsigma  ,qr  tsl  a , workv ,  workm , 
istant,nactiv) 

double  precision  b(ntlmm,ntln«n)  ,esigma(ntotal,ntotal) , 
tsigma (ntl,bigm)  ,qrtsla(bigm)  ,workv(l) , 
workm  (ntl,ntl) 

integer  ntotal,nobs,ntl,ntlnin,bigm,istant(l)  ,nactiv 

#  Compute 

|  b  -  Q  '  *  E(l)  *  Q 

#22 


Copy  rows  and  cols,  of  esigma  that 
correspond  to  active  constraints  in 
workm,  colunn  by  colimn 


do  j-l,nobs  { 


call  dcopy  (nobs,esigma(l, j) ,l,workm(l, j)  ,1) 
il-nobs 

do  i-nobs+1 ,ntotal  { 

if  (istant (i-nobs)  |«  0)  { 
il-il+1 

workm(il, j)«esigma(i,j) 

} 

} 

} 

jl^iobs 

do  j-nobs+l,ntotal  { 

if  ( istant ( j-nobs)  1*  0}  { 
jl-jl+1 

call  dcopy(nobs,esigma(l, j) ,l,workm(l, jl) ,1) 
il-nobs 

do  i-nob»tl,ntotal  { 

if ( istant ( i-nobs)  1-  0)  { 
il-il+1 

workm(il, jl)-esigma(i,j) 

} 

} 

} 

} 

» 

#  Compute  b 

* 

call  dsqtaq(tsigma,ntl,ntl,bigm,qrtsla,worta,ntl,b,ntljan, 
workv,workm) 

return 


******************************************************************* 

*  DSGRIA  * 

******************************************************************* 


subroutine  dsgrla(lanbda, lamvec, nvalle,nvalri,nvalam, 
bigeig ,ntotal , zero) 

double  precision  lambda, lamvec(i) , bigeig, zero 
integer  nvalle,nvalri,nvalam,ntotal 

* 

#  Construct  regular  grid  around  lambda  (from  unconstrained 

#  problem) 

# 

if  (lambda  <-  zero)  ( 
if  (lambda  <«  -0.5d0)  { 
rival  am-nvalle+1 

lamvec(nvalam) -dlogl0(ntotal*bigeig*l.d3) 
lamvec  (rival  le)  -lamvec  (nvalam)  -2.d0 
lamvec  (nvalle-1 ) -lamvec  (rival  le)  -1  .d0 
lamvec  ( rival  1  e-2 )  -lamvec  ( rival  1  e)  -2 .  dO 


for  (i-rwalle-3;  i>-l?  i-i-1) 
lamvec (i) -lamvec ( i+l)-0. 1 

} 

else  { 

rival  an-nvairi+1 
lamvec (l)-dloglO (zero) 
for  (i-2;  i< *nval am;  i-i+1) 
lamvec ( i ) -lamvec ( 1-1 ) +0 . 1 

} 

} 

else  { 

nvalam-nvalle+nvalri+1 

il-nvalle+1 

lamvec (il)-dloglO (lambda) 
for  (i«il+l;  i<*nvalam;  i-i+1) 
lamvec (i) -lamvec (i-1) +0.1 
for  (i-11-1;  i>-l;  i-i-1) 
lamvec ( i) -lamvec ( i+1 )-0. 1 

} 

return 

end 


**************************************************»************»*»* 

#  DSHE11  * 

******************************************************************* 

subroutine  d she 11  (hess 1 , n total , t sigqr , Id  t , es igma , b igm , nobs , 
qrtsla  ,workv ,  wor  tan) 

double  precision  hessl(ntotal  ,n  total) ,tsigqr (ntotal,bigm) , 

esigma(ntotal ,ntotal) ,qrtsia (bigm) ,workv(ntotal) , 
workm(ntotal ,n total) 
integer  n total ,ldt,bigm 

#  Form  upper  left  hand  comer  of  hessl 

#  Form: 

#  IE  E  E  E  |  |E  I 

#  I  11  11  11  121  I  111 

#  workm  -  I  I*  I  I  *  IE  E  I 

#  |E  E  E  E  I  IE  I  I  11  121 

#  I  21  11  21  211  |  211 

do  i-l,ntotal  { 
do  j-i,ntotal  { 

workm(i, j)«ddot(nobs,esigma(i,l) ,ntotal ,esigma(l, j) ,1) 

«orkm(  j,i)^orkm(i,j) 
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Obtain: 

Q  '  *  workzn  *  Q 
2  2 


call  dsqtaq(tsigqr,ntotal,ntotal,bigm,qrtsia,workm,ntotal, 
hessl,ntotal ,workv,workm) 

return 

end 


******************************************************************* 
*  DSHE12  * 


subroutine  dshel2  (hessl , nto tal ,  tsigqr  ,esigma  ,bigm  ,nobs ,  nmbigm , 
qrtsia,tsigma,workm) 

double  precision  hessl (nto tal, nto tal) , tsigqr (nto tal, bigm) , 
esigma (ntotal , nto tal) ,qrtsia(bigm) , 
tsigma  (nto  tal, bigm)  ,wor km  (ntotal  ,bigm) 
integer  nto tal, bigm, nobs, nmbigm 


Compute  upper  right  hand  comer  of  hessl 


Q  '*  I  111  *  T 
2  IE  |  1 

I  211 


IE  I 
I  111  *  T 
IE  I  1 
I  211 

put  in  workra 

do  j**l,bigm  { 
do  i«l,ntotal  { 

v*orkm(i,j)“ddot(nobs,esigma(l,i)  ,l,tsigma(l,  j)  ,1) 


Multiply  by 


job  -  01000 
do  j-l,bigm 

call  dqrsl(tsigqr, ntotal, ntotal, bigm, qrtsia,workm(l,j) , 
dusmy,workm(l,j)  ,durmy,dvjmmy,diinmy,  job, info) 

#  Put  last  nmbigm  rows  of  workm  which 

#  contain  hesl2  into  upper  right  hand 

#  and  lower  left  hand  comer  of  hessl 

j2«bigm+l 
do  j«l,bigm  { 
jl-nmbigmfj 

call  dcopy  (nmbigm, v*3rkm(j2,j) , 1, hessl (l,jl)  ,1) 
call  dcopy  (nmbigm, workm  (j2,j) ,  1, hessl ( jl,  1)  , ntotal) 

} 

return 

end 


A****************************************************************** 

*  DSHE22  * 

******************************************************************* 

subroutine  d she 22  (hessl, ntotal ,  nobs, bigm, nmbigm,  tsigma) 
double  precision  hessl (ntotal, ntotal) , tsigma (ntotal, bigm) 
integer  ntotal, bigm, nmbigm, nobs 

Compute  lower  right  hand  comer  of  hessl  ■ 

T  i  *  <r 
1  1 

do  j-nmbigm+1 , ntotal  { 
do  i»j, ntotal  { 

hessl (i,j)«ddot(nobs,tsigma(l,i-nmbigm)  ,l,tsigma(l,  j-ranbigm)  ,1) 
hessl (j,i) -hessl  (i,j) 

} 

return 

end 


******************************************************************* 

*  DSHEMU  * 

******************************************************************* 

subrout i ne  dshemu (n , j thcol , hess 1 , hess 2 , ntotal , nmbigm , 1 ambd a , y , hy) 
double  precision  y(l) ,hy(l) , hessl (ntotal, 1) ,hess2 (nmbigm, 1) , lambda, 
lamnob 

integer  n,j thcol, ntotal, nmbigm 
common  /block/nobs 


Note  hessl (ntotal , ntotal) , 
he  ss  2  (rmbigm  ,nmgigm) 


Multiply  hessian  times  vector  y  or  recover  jth  column  of  hessian 


n  number  of  elements  in  y 

jthcol  If  jthcol  *  0,  hy  will  contain  the  product 

hessian*y 

If  jthcol  >  0,  hy  will  contain  the  jth  column 
of  the  hessian 

hessl  Double  (ntotal , ntotal) . 

First  part  of  hessian  for  spline  problem. 
hess2  Double  ( rmbigm, ranbigm) . 

Second  part  of  hessian  which  gets  multiplied 
by  lambda. 

ntotal  Integer  variable. 

Row  dimension  of  hessl. 

rmbigm  Integer  variable. 

Row  dimension  of  hess2. 

'  lambda  Double  Precision  variable. 

Current  value  of  lambda. 


y  Input  vector  required  to  compute  hessian *y 

when  jthcol  »  0. 

hy  Output  vector  containing  jth-col  of  hessian  or 
hessian *y  depending  on  value  of  jthcol 


amnob-nobs*  lambda 
f  (jthcol  “  0)  { 

compute  hessian*y  put  in  hy 

do  j«l, rmbigm  { 

hy(  j)«ddot (ntotal , hessl (l,j)  ,l,y,l)+ 

^  lamnob*ddot(nmbigm,hess2(l, j) ,l,y,l) 

do  j»nmbigm+l, ntotal  { 

hy { j) -ddot (ntotal , hessl (1 , j ), l,y, 1) 


else  ( 


call  dcopy(ntotal , hessl (1, jthcol) ,l,hy,l) 
if  (jthcol  <■  rmbigm) 

call  daxpy(rmbigm,lamnob,hess2(l, jthcol) ,l,hy,l) 


return 
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end 


******************************************************************* 

*  DSHES1  * 

******************************************************************* 

subroutine  dshesl (hessl, ntotal ,nrobigm,nobs, tsigqr ,qrtsia,tsigma, 
bigm , esigma ,work) 

double  precision  hessl  (ntotal ,  ntotal) , tsigqr (ntotal ,bigm) , 
qrtsia(bign)  ,tsigma (ntotal, bigm) , 

•sigma (ntotal, ntotal)  ,work(l) 
integer  ntotal, bigm 

* 

#  Compute  hessl 

* 

#  Form  first  nmbigm  X  nmbigm  of  hessl 

f 

rrwl«ntotal+l 

call  dshell  (hessl , ntotal , tsigqr , ntotal , esigma , bigm ,nobs ,qr tsia , 
work(l) ,work(nwl)) 

# 

I  Form  upper  right  hand  comer  of  hessl 

* 

call  dshel2 (hessl, ntotal, tsigqr, esigma, bigm, nobs, nmbigm, qr tsia, 
tsigma,work) 

* 

#  Form  lower  right  hand  comer  of  hessl=«T  '*T 

*  11 

# 

call  dshe22 (hessl, ntotal , nobs, bigm, nmbigm, tsigma) 

return 

end 


******************************************************************* 


*  DSHOFP  * 

******************************************************************* 


subroutine  dsho  fp ( pi , x , ldx ,d im , nobs , s , Ids , neons , nto  tal ,m , 
the tarn , bigm , powers ,ldp ,coef , ho fp) 
double  precision  pl(6) ,x(ldx,l) ,s(lds,l) ,thetam,hofp,tl,t2,t3, 
p2(6) ,coef(l) ,eij,prod 

integer  ldx, lds,ncons, ntotal, bigm, Idp, powers (ldp, bigm) ,dim 

I 

#  This  routine  evaluates  the  spline  h  at  the  point  pi. 

* 

#  h(pl)-thetam  *  (tl  +  t2)  +  t3 

* 

#  where: 

#  nobs 


tl-  SIM  coef(i)*e(x  ,pl) 
i-1  i 

neons 

t2  *  SLM  coef (nobs+i) *e(s  ,pl) 
i-1  i 

bigtn 

C3  «  SUM  coef (ntotal+i)  *phi  (pi) 
i-1  i 

and 

e(p2,pl)  -  t**(2*m-dim)  if  dim  is  odd 

-  t**(2*m-dim)  *ln(t)  if  din  is  even 

t  -  II  p2  -  pi  ||. 


#  Compute  tl 

* 

tl  -  O.dO 
do  j-l,nobs  { 
do  i-1, dim 
p2(i)-x(i,j) 

call  dseij(pl,p2,dim,m,eij) 
tl-tl+coef ( j) *ei j 

} 

* 

#  Compute  t2 

* 

t2  «  O.dO 
do  j-1, neons  { 
do  i-1, dim 
P2(i)-s(i,j) 

call  dseij (pl,p2,dim,m,eij) 
t2-t2+coef (nobs+j) *ei j 

} 

# 

t  Compute  t3 

* 

t3H).dO 
do  j-l,bigm  { 
prod-1. OdO 
do  i-1, dim  { 
ip-powers(i, j) 
prod-prod* (pi (i) **ip) 

} 

t3-t3+coef (ntotal+j) *prod 

} 

hofp-thetam  *  (tl  +  t2)  +  t3 
return 
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I 

end 


******************************************************************* 
*  DSLIMA  * 

******************************************************************* 


subroutine  dsl  ima  ( qpl  ima , nobs ,  nto tal , bigm  ,nmbigm ,  tsigma , 
tsigqr ,qr tsia ,esigma ,work) 

double  precision  qpl ima (nobs, nto tal) , tsigma (ntotal, bigm) , 
tsigqr (nto tal ,bigm) ,qrtsia (bigm) , 
esigma (ntotal, nto tal) ,work(l) 
integer  nobs, nto tal, bigm, nmbigm 


Compute  matrix  qpl ima  which  defines  linear  term 


I 


cpl  ima* 


IE  | 
Q  '*1  111 
2  IE  | 
I  211 


T  ' 
1 


job-01000 

11- nnbignH-1 

12- bignH-l 

do  i-l,nobs  { 

call  dqr si (tsigqr , nto tal , nto tal, bigm, qr tsia, 

esigma(l,i)  ,dmy,work(l)  ,dmy,dmy,dmy, 
job, info) 

call  dcopy(rmbigm,vnork(i2)  ,l,qplima(i,l)  ,nobs) 
call  dcopy(bigm,tsigma(i,l) ,ntotal,qplima(i,il) ,nobs) 


return 

end 


****************************v***********>  ************************* 

*  DSLIVE  * 

*******************************************  *********************** 

subrout i ne  dsl i ve ( qpl ive ,nto  ta 1 , z , nobs ,qpl ima ) 

double  precision  qpl ive (ntotal) ,z (nobs) ,qpl ima ( nobs, ntotal) 

integer  ntotal, nobs 

* 

#  Compute  -z'*<plima  -  linear  term  in  quadratic  function 

t 

do  i-1, ntotal 


qplive(i)  — ddot(nobs,z,l,qplima(l,i)  ,1) 
return 
end 


******************************************************************* 

DSLMIN  * 

**************************************************************  **** 

subroutine  dslmin ( vofl am , nval am , lmin in ) 
double  precision  vofl  am  (nval  am) 
integer  lminin(nvalam) 

* 

#  This  routine  determines  for  which  indices 

#  in  voflam  there  is  a  local  minimum  and  sets 

#  the  corresponding  entry  of  lminin  to  1 

# 

if  (nval am  <*  3)  return 

if  (voflam(l)  <  voflam (2)  &  (voflam(l)  >  0)) 
lminin (1)  ■  1 
else 

lminin (1)  ■  0 

if  ((voflam(nvalam)<voflam(nvalam-l))  &  (voflam (nval am)  >  0)) 
lminin  (nval am)  *  1 
else 

lminin  (nval am)  *  0 
do  i*2,nvalam-l  { 

if  ( (voflam  (i)<*voflam(i-l))  &  (voflam  (i)  <»voflm(i+l)  )&  (voflam  ( i)  >0) ) 
lminin  (i)  ■  1 
else 

lminin  (i)  ■  0 

} 

return 


******************************************************************* 

#  DSMACH  * 

******************************************************************* 

subroutine  dsmach(  wmach  ) 
double  precision  wmach (15) 

#  mchpar  must  define  the  relevant  machine  parameters  as  follows. 

#  vanach(l)  ■  nbase  *  base  of  floating-point  arithmetic. 

#  wmach ( 2 )  ■  ndigit  ■  no.  of  base  wmach(l)  digits  of  precision. 

#  wmach (3)  *  epsmch  *  floating-point  precision. 

#  wmach ( 4 )  ■  rteps  =  sqrt ( epsmch) . 

#  wmach  (5)  »  flmin  *  snallest  positive  floating-point  number. 

#  wmach(6)  »  rtmin  ■  sqrt ( flmin) . 

#  wmach (7)  ■  flmax  ■  largest  positive  floating-point  number. 


#  wmach(8)  *  rtmax  *  sqrt(flmax) . 

#  wnach(9)  =  undflw  *  0.0  if  underflow  is  not  fatal, +ve  otherwise. 

#  wmach(lO)  «  nin  *  standard  file  number  of  the  input  stream. 

#  wmach(ll)  ■  nout  *  standard  file  number  of  the  output  stream. 

double  precision  dsqrt 

# 

nbase  -  2 

ndigit  *  56 

wmach(l)  -  nbase 

wmach(2)  =■  ndigit 

vmach(3)  *  wmach(l)**(l  -  ndigit) 

*mach(4)  «  dsqrt(wmach(3) ) 

vmach(5)  «  wmach(l)**(-128) 

v*nach(6)  ■  dsqrt(vmach(5) ) 

v*nach(7)  **  wmach(l)**126 

wmach(8)  *  dsqrt(v*nach(7) ) 

vmach(9)  *  0.0 

nin  »  5 

nout  ■  8 

wmach(lO)  ■  nin 

wmach(ll)  *  nout 

return 

end 


******************************************************************* 

#  DSPOLY  * 

***************^********************** ***************************** 

subroutine  dspoly  (dim,m,powers,ldp) 

#  Record  powers  of  polynomials. 

# 

integer  powers(ldp,l) ,pl,p2,p3,p4,p5,p6, dim, m,ldp, count 
count-0 

if  (dim  —  1)  { 
do  pl*0,m-l  { 
count-count+1 
powe  rs ( 1 , count) -pi 

} 

return 

} 

if  (dim  —  2)  { 
do  pl®0,m-l 

do  p2-0,m-l-pl  { 
count*count+l 
pows  rs ( 1 , coun t ) =pl 
powers (2, count) -p2 


} 
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return 

} 

if  (dim  —  3)  { 
do  pl-0,m-l 

do  p2^,m-l-pl 

do  p3*0,m-l-pl-p2  { 
count«count+l 
powers (lf count) «pl 
powers (2, count) =p2 
powers (3, count) *p3 

return 

} 

if  (dim  —  4)  { 
do  pl«0,m-l 

do  p2»0,m-l-pl 

do  p3»0,m-l-pl-p2 

do  p4«Orm-l-pl-p2-p3  { 
count~count+l 
powers (1 , count) *pl 
powers (2, count) *p2 
powers  (3 ,  count)  =*p3 
powers  (4 ,  count)  *p4 

} 

return 

} 

if  (dim  —  5)  { 
do  pl*0,tn-l 

do  p2*0,m-l-pl 

do  p3»0,in-l-pl-p2 

do  p4-0,m-l-pl-p2-p3 

do  p5^3,m-l-pl-p2-p3-p4  { 
coint*count+l 
powers (1, count) *pl 
powers (2, count) *p2 
powers (3, count)  *p3 
powers (4 , count) *p4 
powers  (5,  count)  -p5 

} 

return 

} 

if  (dim  —  6)  { 
do  pW),m-l 

do  p2»0,m-l-pl 

do  p3«0,m-l-pl-p2 

do  pW),m-l-pl-p2-p3 

do  p5-0,m-l-pl-p2-p3-p4 

do  p€*0,m-l-pl-p2-p3-p4-p5  { 
count»count+l 
powers (1, count) =pl 
powers (2, count) »p2 
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powers  (3 , count)  *p3 
powers (4, count) *p4 
powers (5, count) *p5 
powers (6, count) »p6 

} 

return 

} 

end 


******************************************************************* 

#  DSCfTAQ  * 

******************************************************************* 

subroutine  dsqtaq  ( t , ldt  ,n  ,m , qr aux  ,a ,  Ida , q2taq2,  ldqaq , workv , wor km) 
double  precision  t(ldt,m) ,qraux(m) ,a(lda,n) ,workv(n) , 
q2taq2 (ldqaq, n-m) ,workm(lda,n) 
integer  ldt, n, in, Ida,  ldqaq 

#  This  routine  forms  the  product 

#  IQ  ‘I 

#  |  1  I  *  A  |Q  Q  |  -  Q'  ‘  A  *  Q 

#  IQ  'I  I  1  2| 

#  I2| 

#  and  stores  Q  '  *  A  *  Q 

#  2  2 

f  in  the  matrix  q2taq2. 

#  To  save  storage  the  matrix  a  can  also  be  used  as  workm 

#  in  which  case  on  return  a  is  destroyed 


*  Get 

*  workm-Q ' *A 

* 

jotHJIOOO 
do  j«l,n 

call  dqrsl(t,ldt,n,m,qraux,a(l,  j)  ,dmy,workm(l, j)  ,dmy,dmy, 
dmy,  job,  info) 

#  Post-multiply  the  last  i-m  rows  by  Q 

*  2 

#  and  put  in  Q'AQ,  this  is  done  by 

#  premultiplying  the  last  n-m  columns  of 

#  workm'  by  Q*  and  then  taking  the  last 

#  n-m  rows  of  the  resulting  matrix 


nnirmwi-m 
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for  (i-mfl;  i<-n;  i-i+1)  { 
il-i+m 

#  copy  ith.  row  of  workm  to 

#  workv(-  ith. col.  of  workm') 
call  dcopy(n,workm(i,l) ,lda, workv ,1) 

#  put  Q'*ith  row  of  workm  in  workv 

call  dqrsl(t,ldt,n,m,qraux,workv,dmy, workv  ,dmy,dmy,dmy, job, info) 

#  copy  last  n-m  els.  of  workv  to  (i-m)th  col. 

#  of  q2taq2 
nmiram-mfl 

call  dcopy(nminm,workv(mfl)  ,l,q2taq2(l,i-m)  ,1) 

} 

return 


******************************************************************* 

*  DSRD1  * 

******************************************************************* 

subroutine  dsrdl  (ntotal , nobs, neons, bigm, dim, nmbigm,hessl, 
hess2,qpl  ima ,qpcons , tsigqr ,qrtsi a , 
esigma, terror) 

double  precision  hessl (ntotal, ntotal)  ,hess2(nmbigm,nmbigm) , 
qpl ima (nobs, ntotal) ,qpcons (neons, ntotal) , 
tsigqr  (ntotal, bigm)  ,qrtsia(bigm) , 
esigma  (ntotal, ntotal) 
integer  ntotal ,nobs,ncons,bigm,dim, terror 

# 

#  Read  intermediate  results  from  file  associated  to  unit  1 

* 

f  If  first  input  record  does  not  agree  with  the  values  from 

#  the  calling  program  terror  is  set  to  10  and  nothing  is  done 

* 

#  If  there  is  insufficient  data  in  file  ierr  is  set  to  11 

#  and  the  routine  returns  at  that  point 

# 

read  (l,end**100)  nt,no,nc,mbig,nd 

if  (no  !»  nobs  I  nc  J«  neons  I  mbig  l*  bigm  I  nd  !*  dim) { 
terror  -10 
return 

} 

nmbigmpnto tal-bigm 

read  (1, end-100)  ( (hessl (i,j) ,i-l, ntotal) ,j«l, ntotal) 
read  (1, end-100)  ( (hess2(i,j) ,i-l,nmbigm) ,j»l,nmbigm) 
read  (1, end-100)  ((qplima(i,j) ,1-1, nobs) ,j«l, ntotal) 
read  (1, end-100)  ((qpcons(i,j) ,i-l, neons) ,j-l, ntotal) 
read  (1, end-100)  ( (tsigqr(i,j) ,i-l, ntotal) ,j-l, bigm) 
read  (1, end-100)  (qrtsia(i) ,i-l,bigm) 
read  (1, end-100)  ( (esigma(i,j) ,i-l,ntotal) ,j-l, ntotal) 
terror  «  0 


return 

* 

#  error  return  -  insufficient  data 

* 

100  ierror  *  11 
return 


******** ************* ******************** ****** ******************** 

*  DSRDS  * 

******************************************************************* 

subroutine  dsrds(x,ldx,dim, nobs, s, Ids, neons, z,m,nvalam,istate, 
nactiv , i actch , powers ,ldp,bigm,coef ,hhat , 
voflam,lanvec) 

double  precision  x(ldx,l) ,s(lds,l) ,z(l) ,coef (1) ,hhat(l) , 
voflam(l) ,lamvec(l) 

integer  ldx  ,d  im  ,nobs ,  Ids  ,ncons  ,m , rival  am ,  i  state  ( 1 )  ,nactiv,iactch(l) , 
powers(ldp,l) ,ldp,bigm 

* 

#  Read  solution  obtained  by  dscotnp  in  file  splsol 

* 

open  (3,file*'splsol' , form- 'unformatted' ) 
rewind  (3) 

read  (3)  dim, nobs, ncons,m,bigm,nvalam, nactiv, thetam 
read  (3)  ( (x(i,j) ,i-l, dim) ,j«l, nobs) 
read  (3)  ( (s(i,j) ,i-l,dim) ,j-l, neons) 
read  (3)  (z(i) ,i-l,nobs) 

read  (3)  (istate(i)  ,i«nobe+ncons+'l,nobs+2*ncons) 

read  (3)  (iactch(i)  ,i«l,nvalan) 

read  (3)  ( (powers(i,j) ,i«l,dim) ,j»l,bigm) 

read  (3)  (coef (i) ,i«l,nobsfncons+bigm) 

read  (3)  (hhat (i) ,i«l,nobs) 

read  (3)  (voflam(i) /i-lfnvalam) 

read  (3)  (lamvec(i) ,i*l,nvalam) 

return 


******************************************************************* 

*  DSSOLV  * 

******************************************************************* 

subroutine  dssolv(hessl,ntotal,hess2,nmbigm,bigm,qplima, 
qpl ive ,qpcons , neons , nobs , ts igma , 
qrtsil ,lamvec,voflam, indlam,coef , ioptv, iqpout , 
nvalle ,nval r i ,nvalam , zero ,f ini ty ,bl ,bu , 
start , istate , i stant , i wor k , i actch , 
wnach , bigeig , i tmax , hhat , z , es igma , nactiv ,e igen , 
work,ierr) 


external  dshemu 
common  /block/nobser 
logical  start 

integer  ntotal  ,nmbigm,bigm,ncons,nobs, ntotal, indlam,ioptv,iqpout, 
nvalle,nvalri,nvalam,istate(l)  ,istant(l)  ,iwork(l) , 
iactch ( 1 ) / i tmax ,nactiv 

double  precision  hessl (ntotal, 1) ,hess2(rmbigm,l) ,qplima(nobs,ntotal) , 
qplima(nobs, ntotal) ,qpl ive(ntotal) , 
tsigma  (ntotal  ,bigm)  ,qrtsil  (bigm)  ,lamvec(  rival  am) , 
voflam(nvalam) ,coe£ (ntotal) ,zero,f inity, 
bl  (ncons+-n  total) , bu ( neon s+n total)  ,wmach (15) , 
bigeig,hhat (nobs) ,z(nobs) ,esigma  (ntotal , ntotal) , 
work(l) , vnum, lambda, eigen (1) 

# 

#  This  routine  solves  the  actual  problem  optimizing  with  respect  to 
t  lambda  if  necessary.  The  routine  qpfc  is  called  from  this  routine 

* 

ierr-0 

lambda-lamvec ( 1 ) 
if  (ioptv  —  0)  { 
nvalle-0 
nvalri-0 
rival  an- 1 

lamvec  ( 1 )  «dlogl  0  (lambda) 

} 

else 

#  Construct  regular  grid  of  values  of  log (lambda) 
call  dsgrla ( lambda, lanrvec,nvalle,nvalri , nval am, bigeig, ntotal , zero) 

maglvl-iqpout 

msglv^iqpout 

maacact*ncons 

n*ntotal 

ncl  in-ncons 

nctotl-nfnclin 

liwork-nctotl 

lwork-n*nf  8*n+ncl  im-  (nclinfl)*  (nclin+l)+n*  (nclin+l)+l 

nfree  ■  n 

featol-wmach(4) 

* 

f  set  coef  to  initial  guess 

#  (Recall  that  work(l)  is  used  as  coef 

#  to  have  in  coef  a  minimum  at  every  stage 

# 

do  i*l,n 

work(i)«0.  ldO 

* 

#  initialize  istant 

« 

do  i«l,nclin 
istant ( i) — 1 


I 
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* 

* 

do  i-l,n  { 

bl(i)— finity 
bu(i)-  finity 
istate(i)«0 


iactch  (rival  an)  *0 
vlant-finity 
nobser-nobs 
for  (ival-nvalam;  ival>«l;  ival-ival-1)  { 
nwO-ntotal+1 
nwl*ncons+nwO 
nw2-ntotal+ncons+nwl 
lambda-10. d0**lamvec ( ival) 

call  qpfc(dshemu,hessl,hess2,ntotal ,nmbigm, lambda, start, i ter, itmax, 
liwork,lwork,m9glvl,msglvq,maxact,n,nclin,nctotl,nclin, 
inform,nactiv,nfree,istate,iwork,finity,finity,featol, 
qpcons,work(nwO)  ,bl,bu,work(nwl)  ,qpl  ive,  work,  work  (rtw2) , 
wmach) 

if  (inform  >  0)  { 

vofl am ( ival ) --1 . OdO 
if  (inform  —  200)  inform-5 
iactch  ( ival ) -inform+l 
next 
}  • 

if  (rival  am  —  1)  { 

call  dcopy(n,work,l,coef ,1) 

indlam-1 

break 

} 

ntl-nobsfnactiv 

ntlmnrotl-bigm 

nwl-nwO+ntlnn*ntlinm 

nw2-nwl4ntlran*ntlnn 

vmm-Q.dO 

* 

#  Compute  sunn  of  squares  of  the 

#  residuals  to  form  nunerator  of 

#  Cross-validation  function 

# 

do  i-l,nobs  { 

* 

#  compute  hhat(z(i)  put  in  work(nw0+i-l) 

* 

work (nwO+i-1) -ddot (ntotal ,qpl ima ( i , 1 ) ,nobs ,v»rk ,1) 


initialize  first  n  rows  of  bl,bu  and  istate 


for  each  value  of  lambda  in  lamvec  solve 
quadratic  programming  problem  and  evaluate 
generalized  cross-validation 


temp^rfork(nwO+i-l)-z  ( i) 
vnunFvnun+temp*  temp 

}. 

t 

#  Compute  generalized  cross-validation 

* 

call  dsvofl  (voflam(ival)  ,lambda,vnun,n total ,bigm, nobs, neons,  nmbigm 
qplima,qpcons,esigma,tsigma,qrtsil,istate,istant, 
nactiv,ntl,ntlnm,idifer,neig, eigen, work  (nwO) ,work(nwl) 
work(nw2) ,ierr,iwork) 
if  (ierr  I*  0)  return 

* 

#  If  set  of  active  constraints  charged,  change 

#  the  value  of  iactch(ival) 

I 

if  (idifer  «  1  i  ival  <  nvalam)  { 
if  (iactch(ival+l)  *■  0) 
iactch(ival)«l 

iactch ( ival) *0 

#  Set  start  to  .true,  to  use  current  set  of 

#  active  constraints  as  guess  for  next  value 

#  of  lanbda 

start*. true. 

#  If  current  value  of  v (lambda)  is  smaller 

t  previous  one,  store  current  value  of  ival 

#  in  lndlam  and  current  solution  in  coef 

if  (voflam(ival)  <  vlant)  { 
vlant*vofl  am  ( ival ) 
call  dcopy(n,work,l,coef ,1) 
indlanpival 

} 

# 

#  Go  to  solve  q.p.  problem  for  next  value 

f  of  lambda 

} 

#  Compute  hhat 

I 

do  i*l,nobs 

hhat{i)«ddot(ntotal,qplima(i ,1) , nobs, coef ,1) 
return 
end 
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******************************************************************* 

*  DSTETA  * 

******************************************************************* 


subroutine  dsteta  (dim,ra,  thetam) 

t 

#  get  theta  sub  m  for  dim,m 


integer  dim,m 

double  precision  thetam 

integer  rtab(36),p 

double  precision  pi 

data  pi  /  3. 14159265358979323846d0  / 

data  rtab  /  12,  -240,  10080,  -725760,  79833600,  -12454041600, 
8,  -128,  4608,  -294912,  29491200,  -4246732800, 

-8,  96,  -2880,  161280,  -14515200,  1916006400, 

1,  64,  -1536,  73728,  1,  1, 

1,  -64,  1152,  1,  1,  1, 

1,  1,  768,  1,  1,  1  / 

# 

1  -  6*dim  +  m  -  7 
p  *  dim/2 

thetam  ■  l.OdO  /  (dfloat(rtab(l) )  *  (pi**p)) 

return 

end 
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******************************************************************* 

*  D6TSIG  * 

******************************************************************* 

subroutine  dstsig (x,ldx,s, Ids, power s,ldp, tsigma, ntotal ,bigm, 
nobs ,  neons  ,d  in,  i  sigma ,  sigma) 
double  precision  x(ldx,nobs) ,s (Ids, neons) , 

sigma (nobs),  tsigma (ntotal ,bigm) 
integer  ldx,lds,dim,nobs,isigma,  powers  (ldp,bigm)  ,ldp,bigm, 
ntotal , neons 


Compute: 
tsigma  « 


I  SIGMA 


01  IT  | 
I  I  II 
II  IT  | 
I  I  21 


do  l-l,bigm  ( 
do  j-l,nobs  ( 
prod-1. OdO 
do  i*l,dim  ( 
ij^powers(i,l) 
if  (ip  —  0)  next 
prod  -  prod*(x(i,j))**ip 

#  divide  by  sigma  if  needed 

if  (isigma  —  0)  prod-prod/sigma (j) 
tsigma  (j,l)  -prod 

} 

do  j-1, neons  { 
prod  -  1.  OdO 
do  i-l,dim  ( 
ij^powers(i,l) 
if  (ip  —  0) 

.  next 
else 

prod-prod* (s ( i , j ))** ip 

} 

tsigma  (nobs+ j,l) -prod 


return 


end 
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******************************************************************* 

*  DSVOFL  * 

******************************************************************* 

subroutine  dsvofl (voflam, lambda, vnun, ntotal ,bigm, 

nobs, neons, nmbigm,qplima, qpcons, esigma, 
tsigma,qrtsil,istate,istant,nactiv,ntl,ntlmm, 
idifer ,neig , eigen ,b ,a ,work, ierr , iwork) 
double  precision  vofl am, vnun, qpliroa( nobs, ntotal) , 

qpcons (neons, ntotal) ,esigma(ntotal,n total) , 
tsigma (ntl ,bigm)  ,qrtsil  (bigm)  ,eigen(l) , 
b(ntlnn,ntlmm)  ,a(ntlmm,ntlflm)  ,denom, 
lambda, lam, wo rk(l) ,eigi 

integer  ntl ,ntlmm, ntotal, nobs, neons, nmbigm,nactiv,neig,iwork(l) , 
istate(l) ,istant(l) ,bigm 

♦ 

#  Compute  generalized  cross-validation  function  for 

#  constrained  problems 

# 

lan-lambda 

idifer«0 

# 

#  Check  if  active  constraints  changed 

#  and  update  istant  if  necessary 

4 

do  i»l, neons  ( 

if  (istate(ntotal+i)  i»  istant ( i) )  { 
istant ( i) -istate (ntotal+i) 
idifer-1 

} 

} 

# 

#  If  constraints  did  not  change  skip 

#  generalized  eigenvalue  problem 

4 

if  (idifer  ■*  0)  go  to  100 

4 

4  Recover  T(l)  from  qplima  and  qpcons 

4  and  put  in  tsigma 

4 

do  j*l,bigm  { 

call  dcopy(nobs,qplima(l,nmbigsH-j) ,  1, tsigma (1,  j)  ,1) 

} 

il^Tobs 

do  i«l, neons  { 

if  (istant (i)  1*  0)  { 
il-il+1 

call  dcopy (bigm, qpcons (i,nmbigmfl) , neons, tsigma(il,l)  ,ntl) 

} 

} 

#  Obtain  Q-R  decomposition  of  tsigma 


which  contains  T(l) 


job  -  0 

call  dgrdc(tsigma,ntl,ntl,bigm,qrtsil,jpvt,work, job) 
t  Compute 

#  b  -  Q  *  *  E  (1)  *  Q 

*  2  2 

call  dsgetb(b,ntotalfnobs,ntl,ntlmm,bigm,esigma,tsigma, 
qrtsil,work(l) ,work(ntl+l) ,istant,nactiv) 

#  Get 

#  a  «  Q  '  *  V  *  Q 

#  2  2 

call  dsgeta (a,nobs,ntlan,ntl  ,bigm,tsigma,qrtsil  ,work, 
work(ntl+l)) 

#  .  Solve  generalized  eigenvalue  problem 

call  dseige(ntlom,a,b, eigen/work, ierr) 
if  (ierr  -»  7*ntlmm*-l)  { 

#  b  is  not  pos.  def.  ierr-20 

ierr-20 

return 


'  Note:  eigen  has  the  eigenvalues 

of  -1 

I  '  B  *  A 

i  in  ascending  order 

00  continue 

i  compute  denominator  of  voflam: 

i  dencm  «  (1/nobs)  trace(I-nA(lambda) ) 

i  ntlmm 

t  -(lambda* (SIM 

I  i-1 

I 

)  ( (eigen ( i) /(l+nobs*lambda*eigen ( i) ) ) 


denom-O.dO 

« 


First  compute  trace(I-A(lambda) ) 
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* 

neig-0 

do  i*l,ntlOTi  { 
eigi-eigen(i) 
if  (eigi  >  O.OdO)  { 

n*ig«neig+l 

dencm-denom+eigi/(l. OdO+d float (nobs) *lam*eigi) 

} 

» 

#  neig*nunber  of  positive  eigenvalues 

* 

denotn  -  ( (denon*lao) /dfloat (nobs) ) *denoo*lao 

vofl  w»vnum/dencci 

return 

end 


it**************-**************************************************** 

*  DSWRT1  * 

subroutine  dswrtl (ntotal ,nobs, neons, bigm, dim,nmbigm, hessl, 
hess2,qplima,qpcons,tsigqr,qrtsia, 

•sigma) 

double  precision  hessl {ntotal, ntotal) ,hess2 (nnbigm ,nmbigm) , 
qplima (nobs, ntotal) ,qpco ns (neons, ntotal) , 
tsigqr  (ntotal, bigm)  ,qrtsia(bigm) , 
esigma ( nto  tal , ntotal ) 
integer  ntotal, nobs, neons, bigm, dim,nmbigm 

* 

#  Write  intermediate  results  from  file  associated  to  unit  1 

» 

* 

write  (1)  ntotal, nobs, neons, bigm, dim 
ranbigmmtotal-bigm 

write  (1)  ( (hessl (i,j) ,i»l, ntotal) ,j*l, ntotal) 
write  (1)  ((hess2(i,j) ,i«l,nmbigm) ,j»l, ranbigm) 
write  (1)  ( (qplima (i,j) ,i»l, nobs) ,j*l, ntotal) 
write  (1)  ((qpcons(i, j) ,i«l, neons) ,j«l, ntotal) 
write  (1)  ((tsigqr(i,j) ,i«l, ntotal) ,j»l, bigm) 
write  (1)  (qrtsi*(!) ,i»l,bigm) 
write  (1)  ((esigma(i,j) ,i-l, ntotal) ,j»l, ntotal) 
return 


******************************************************************* 

*  DSWRTS  * 

******************************************************************* 

subroutine  dswrts(x,ldx /dim, nobs ,s, Ids, neons, z,m,nvalan,istate, 
nactiv,iactch,powers,ldp,bigm,coef ,hhat, 
voflam,lamvec) 

double  precision  x(ldx,l) ,s(lds,l) ,z(l) ,coef (1) ,hhat(l) , 
voflam(l)  ,lamvec(l) 

integer  ldx, dim, nobs, Ids, neons ,n,nvalam,istate(l) ,nactiv,iactch(l) , 
powers  (ldp,l) ,ldp,bigm 

* 

#  Write  solution  obtained  by  d scamp  in  file  splsol 

# 

open  (3, file*' splsol* ,formF* unformatted* ) 
rewind  (3) 

write  (3)  dim, nobs, ncons,m,bigm,nvalam,nactiv,thetam 
write  (3)  ( (x(i, j) ,i«l,dim) , j-l,npbs) 
write  (3)  ( (s(i,j) ,i-l,dim) ,j»l, neons) 
write  (3)  (z(i) ,i*l,nobs) 

write  (3)  (istate(i) ,i«nobs+ncons+l,nobs+2*ncons) 

write  (3)  (iactch(i) ,i«l,nvalao) 

write  (3)  ((powers(i,j)  ,i«l,dim)  ,j"l,bigm) 

write  (3)  (coef (i) ,i»l,nobs+ticons+bigra) 

write  (3)  (hhat(i) ,i»l,nobs) 

write  (3)  (voflam(i) ,i»l,nvalam) 

write  (3)  (lamvec(i) ,i-l,nvalam) 

return 

end 


A****************************************************************** 

*  DSZDIS  * 


******************************************************************* 

subroutine  dszdis  (zdata, nobs, sigma) 

# 

$*****  scale  zdata  values  ***** 

******  dszdis  divides  zdata  by  sigma  ***** 

* 

implicit  double  precision  (a-h,o-z) 
dimension  ydata(l)  ,sigma(l) 

* 

» 

******  divide  by  sigma  ***** 
do  i«l,nobs 

ydata ( i) «ydata ( i) /sigma ( i) 
return 
end 
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******************************************************************* 

*  DSZMUS  * 

******************************************************************* 

subroutine  dsanus  (zdata, nobs, sigma) 

# 

******  scale  zdata  values  ***** 

******  dsanus  multiplies  zdata  by  sigma  ***** 

* 

implicit  double  precision  (a-h,o-z) 
dimension  zdata(l) ,sigma(l) 

* 

* 

* 

******  multiply  by  sigma  ***** 
do  i-l,nobs 

zdata ( i) »zdata ( i) *sigma ( i) 
return 
end 
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ABSTRACT 


A  nonparametric  estimate  for  the  posterior  probabilities  in  the 
classification  problem  using  multivariate  smoothing  splines  is  proposed. 
This  estimate  presents  a  nonparametric  alternative  to  logistic  discrimi¬ 
nation  and  to  survival  curve  estimation.  It  is  useful  in  exploring  proper¬ 
ties  of  the  data  and  in  presenting  them  in  a  way  comprehensible  to  the 
layman. 

The  estimate  is  obtained  as  the  solution  to  a  constrained  minimization 
problem  in  a  reproducing  kernel  Hilbert  space.  It  is  shown  that  under  certain 
conditions  an  estimate  exists  and  is  unique. 

A  Monte  Carlo  study  was  done  to  compare  the  proposed  estimate  with  the 
two  par  zone  trie  estimates  most  commonly  used.  These  parametric  estimates 
are  based  on  the  assumption  that  the  distributions  involved  are  normal.  The 
spline  estimate  performed  as  well  as  the  parametric  estimates  in  most  cases 
where  the  distributions  involved  were  normal.  In  the  case  of  non-normal  distri¬ 
butions  the  spline  performed  consistently  better  than  the  parametric  estimates. 

The  computational  algorithm  developed  can  be  used  in  the  more  general 
context  of  estimating  a  smooth  function  h,  when  we  observe 
Zi  =  +  cit  is l,n ,  where  c*'s  are  independent,  zero  mean  and  finite  vari¬ 

ance  random  variables,  Li's  are  linear  functionals,  and  the  solution  is  known  a 
priori  to  be  in  some  closed,  convex  set  in  the  Hilbert  space,  for  example,  the  set 
of  non-negative  functions,  or  the  set  of  monotone  functions.  This  type  of  prob¬ 
lem  arises  in  areas  such  as  cancer  research,  meteorology  and  computerized 
tomography. 

We  also  consider  the  estimation  of  the  logarithm  of  the  likelihood  ratio  by  a 
penalized  likelihood  method.  Existence  and  uniqueness  of  an  estimator  under 
certain  conditions  is  shown.  However,  a  data  based  method  to  estimate  the 
"correct"  degree  of  smoothness  of  the  estimator  is  not  given. 
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